| .. | ||
| benchmarks | ||
| salience | ||
| .gitignore | ||
| deploy.sh | ||
| pyproject.toml | ||
| README.md | ||
| salience-editor-api.nomad.hcl | ||
| smoke-test.sh | ||
| transcript-1.txt | ||
| transcript.txt | ||
| uv.lock | ||
Text Salience API
A Flask API for computing text salience using sentence transformers, with HAProxy-based queue management to handle resource contention.
Architecture
nginx (SSL termination, :443)
↓
HAProxy (queue manager, 127.0.0.2:5000)
├─► [2 slots available] → Gunicorn workers (127.0.89.34:5000)
│ Process request normally
│ Track processing span
│
└─► [Queue full, 120+] → /overflow endpoint (127.0.89.34:5000)
Return 429 with stats
Track overflow arrival
Queue Management
- Processing slots: 2 concurrent requests
- Queue depth: 120 requests
- Queue timeout: 10 minutes
- Processing time: ~5 seconds per request
When the queue is full, requests are routed to /overflow which returns a 429 status with statistics about:
- Recent processing spans (last 5 minutes)
- Overflow arrival times (last 5 minutes)
The frontend can use these statistics to:
- Calculate queue probability using Poisson distribution
- Display estimated wait times
- Show arrival rate trends
Run API
Development (without queue)
uv run flask --app salience run
Production (with HAProxy queue)
- Start Gunicorn with preloaded models (loads models once, forks 3 workers):
uv run gunicorn \
--preload \
--workers 3 \
--bind 127.0.89.34:5000 \
--timeout 300 \
--access-logfile - \
salience:app
(3 workers: 2 for model processing + 1 for overflow/stats responses)
- Start HAProxy (assumes you're including
haproxy.cfgin your main HAProxy config):
# If running standalone HAProxy for this service:
# Uncomment the global/defaults sections in haproxy.cfg first
haproxy -f haproxy.cfg
# If using a global HAProxy instance:
# Include the frontend/backend sections from haproxy.cfg in your main config
- Configure nginx to proxy to HAProxy:
location /api/salience {
proxy_pass http://127.0.0.2:5000;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_read_timeout 900s;
}
Benchmarks
# Generate embeddings
uv run python3 benchmarks/generate_embeddings.py
# Run benchmarks
uv run pytest benchmarks/test_bench_cosine_sim.py --benchmark-json=benchmarks/genfiles/benchmark_results.json
# Visualize results
uv run python3 benchmarks/visualize_benchmarks.py benchmarks/genfiles/benchmark_results.json