# Text Salience API A Flask API for computing text salience using sentence transformers, with HAProxy-based queue management to handle resource contention. ## Architecture ``` nginx (SSL termination, :443) ↓ HAProxy (queue manager, 127.0.0.2:5000) ├─► [2 slots available] → Gunicorn workers (127.0.89.34:5000) │ Process request normally │ Track processing span │ └─► [Queue full, 120+] → /overflow endpoint (127.0.89.34:5000) Return 429 with stats Track overflow arrival ``` ## Queue Management - **Processing slots**: 2 concurrent requests - **Queue depth**: 120 requests - **Queue timeout**: 10 minutes - **Processing time**: ~5 seconds per request When the queue is full, requests are routed to `/overflow` which returns a 429 status with statistics about: - Recent processing spans (last 5 minutes) - Overflow arrival times (last 5 minutes) The frontend can use these statistics to: - Calculate queue probability using Poisson distribution - Display estimated wait times - Show arrival rate trends ## Run API ### Development (without queue) ```bash uv run flask --app salience run ``` ### Production (with HAProxy queue) 1. **Start Gunicorn** with preloaded models (loads models once, forks 3 workers): ```bash uv run gunicorn \ --preload \ --workers 3 \ --bind 127.0.89.34:5000 \ --timeout 300 \ --access-logfile - \ salience:app ``` (3 workers: 2 for model processing + 1 for overflow/stats responses) 2. **Start HAProxy** (assumes you're including `haproxy.cfg` in your main HAProxy config): ```bash # If running standalone HAProxy for this service: # Uncomment the global/defaults sections in haproxy.cfg first haproxy -f haproxy.cfg # If using a global HAProxy instance: # Include the frontend/backend sections from haproxy.cfg in your main config ``` 3. **Configure nginx** to proxy to HAProxy: ```nginx location /api/salience { proxy_pass http://127.0.0.2:5000; proxy_http_version 1.1; proxy_set_header Host $host; proxy_read_timeout 900s; } ``` ## Benchmarks ```bash # Generate embeddings uv run python3 benchmarks/generate_embeddings.py # Run benchmarks uv run pytest benchmarks/test_bench_cosine_sim.py --benchmark-json=benchmarks/genfiles/benchmark_results.json # Visualize results uv run python3 benchmarks/visualize_benchmarks.py benchmarks/genfiles/benchmark_results.json ```