feat: make version deployable

This commit is contained in:
nobody 2025-11-29 13:56:55 -08:00
commit 49bd94cda2
Signed by: GrocerPublishAgent
GPG key ID: D460CD54A9E3AB86
22 changed files with 7785 additions and 10962 deletions

View file

@ -1,10 +1,79 @@
# Text Salience API
A Flask API for computing text salience using sentence transformers, with HAProxy-based queue management to handle resource contention.
## Architecture
```
nginx (SSL termination, :443)
HAProxy (queue manager, 127.0.0.2:5000)
├─► [2 slots available] → Gunicorn workers (127.0.89.34:5000)
│ Process request normally
│ Track processing span
└─► [Queue full, 120+] → /overflow endpoint (127.0.89.34:5000)
Return 429 with stats
Track overflow arrival
```
## Queue Management
- **Processing slots**: 2 concurrent requests
- **Queue depth**: 120 requests
- **Queue timeout**: 10 minutes
- **Processing time**: ~5 seconds per request
When the queue is full, requests are routed to `/overflow` which returns a 429 status with statistics about:
- Recent processing spans (last 5 minutes)
- Overflow arrival times (last 5 minutes)
The frontend can use these statistics to:
- Calculate queue probability using Poisson distribution
- Display estimated wait times
- Show arrival rate trends
## Run API
### Development (without queue)
```bash
uv run flask --app salience run
```
### Production (with HAProxy queue)
1. **Start Gunicorn** with preloaded models (loads models once, forks 3 workers):
```bash
uv run gunicorn \
--preload \
--workers 3 \
--bind 127.0.89.34:5000 \
--timeout 300 \
--access-logfile - \
salience:app
```
(3 workers: 2 for model processing + 1 for overflow/stats responses)
2. **Start HAProxy** (assumes you're including `haproxy.cfg` in your main HAProxy config):
```bash
# If running standalone HAProxy for this service:
# Uncomment the global/defaults sections in haproxy.cfg first
haproxy -f haproxy.cfg
# If using a global HAProxy instance:
# Include the frontend/backend sections from haproxy.cfg in your main config
```
3. **Configure nginx** to proxy to HAProxy:
```nginx
location /api/salience {
proxy_pass http://127.0.0.2:5000;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_read_timeout 900s;
}
```
## Benchmarks
```bash
# Generate embeddings