feat: make version deployable

2025-11-29 13:56:55 -08:00 · 2025-11-29 13:56:55 -08:00 · 49bd94cda2
commit 49bd94cda2
parent 4aa8759514
22 changed files with 7785 additions and 10962 deletions
--- a/api/README.md
+++ b/api/README.md
@ -1,10 +1,79 @@
 # Text Salience API

+A Flask API for computing text salience using sentence transformers, with HAProxy-based queue management to handle resource contention.
+
+## Architecture
+
+```
+nginx (SSL termination, :443)
+    ↓
+HAProxy (queue manager, 127.0.0.2:5000)
+    ├─► [2 slots available] → Gunicorn workers (127.0.89.34:5000)
+    │                          Process request normally
+    │                          Track processing span
+    │
+    └─► [Queue full, 120+] → /overflow endpoint (127.0.89.34:5000)
+                              Return 429 with stats
+                              Track overflow arrival
+```
+
+## Queue Management
+
+- **Processing slots**: 2 concurrent requests
+- **Queue depth**: 120 requests
+- **Queue timeout**: 10 minutes
+- **Processing time**: ~5 seconds per request
+
+When the queue is full, requests are routed to `/overflow` which returns a 429 status with statistics about:
+- Recent processing spans (last 5 minutes)
+- Overflow arrival times (last 5 minutes)
+
+The frontend can use these statistics to:
+- Calculate queue probability using Poisson distribution
+- Display estimated wait times
+- Show arrival rate trends
+
 ## Run API
+
+### Development (without queue)
 ```bash
 uv run flask --app salience run
 ```

+### Production (with HAProxy queue)
+
+1. **Start Gunicorn** with preloaded models (loads models once, forks 3 workers):
+```bash
+uv run gunicorn \
+    --preload \
+    --workers 3 \
+    --bind 127.0.89.34:5000 \
+    --timeout 300 \
+    --access-logfile - \
+    salience:app
+```
+(3 workers: 2 for model processing + 1 for overflow/stats responses)
+
+2. **Start HAProxy** (assumes you're including `haproxy.cfg` in your main HAProxy config):
+```bash
+# If running standalone HAProxy for this service:
+# Uncomment the global/defaults sections in haproxy.cfg first
+haproxy -f haproxy.cfg
+
+# If using a global HAProxy instance:
+# Include the frontend/backend sections from haproxy.cfg in your main config
+```
+
+3. **Configure nginx** to proxy to HAProxy:
+```nginx
+location /api/salience {
+    proxy_pass http://127.0.0.2:5000;
+    proxy_http_version 1.1;
+    proxy_set_header Host $host;
+    proxy_read_timeout 900s;
+}
+```
+
 ## Benchmarks
 ```bash
 # Generate embeddings