87 lines
2.5 KiB
Markdown
87 lines
2.5 KiB
Markdown
# Text Salience API
|
|
|
|
A Flask API for computing text salience using sentence transformers, with HAProxy-based queue management to handle resource contention.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
nginx (SSL termination, :443)
|
|
↓
|
|
HAProxy (queue manager, 127.0.0.2:5000)
|
|
├─► [2 slots available] → Gunicorn workers (127.0.89.34:5000)
|
|
│ Process request normally
|
|
│ Track processing span
|
|
│
|
|
└─► [Queue full, 120+] → /overflow endpoint (127.0.89.34:5000)
|
|
Return 429 with stats
|
|
Track overflow arrival
|
|
```
|
|
|
|
## Queue Management
|
|
|
|
- **Processing slots**: 2 concurrent requests
|
|
- **Queue depth**: 120 requests
|
|
- **Queue timeout**: 10 minutes
|
|
- **Processing time**: ~5 seconds per request
|
|
|
|
When the queue is full, requests are routed to `/overflow` which returns a 429 status with statistics about:
|
|
- Recent processing spans (last 5 minutes)
|
|
- Overflow arrival times (last 5 minutes)
|
|
|
|
The frontend can use these statistics to:
|
|
- Calculate queue probability using Poisson distribution
|
|
- Display estimated wait times
|
|
- Show arrival rate trends
|
|
|
|
## Run API
|
|
|
|
### Development (without queue)
|
|
```bash
|
|
uv run flask --app salience run
|
|
```
|
|
|
|
### Production (with HAProxy queue)
|
|
|
|
1. **Start Gunicorn** with preloaded models (loads models once, forks 3 workers):
|
|
```bash
|
|
uv run gunicorn \
|
|
--preload \
|
|
--workers 3 \
|
|
--bind 127.0.89.34:5000 \
|
|
--timeout 300 \
|
|
--access-logfile - \
|
|
salience:app
|
|
```
|
|
(3 workers: 2 for model processing + 1 for overflow/stats responses)
|
|
|
|
2. **Start HAProxy** (assumes you're including `haproxy.cfg` in your main HAProxy config):
|
|
```bash
|
|
# If running standalone HAProxy for this service:
|
|
# Uncomment the global/defaults sections in haproxy.cfg first
|
|
haproxy -f haproxy.cfg
|
|
|
|
# If using a global HAProxy instance:
|
|
# Include the frontend/backend sections from haproxy.cfg in your main config
|
|
```
|
|
|
|
3. **Configure nginx** to proxy to HAProxy:
|
|
```nginx
|
|
location /api/salience {
|
|
proxy_pass http://127.0.0.2:5000;
|
|
proxy_http_version 1.1;
|
|
proxy_set_header Host $host;
|
|
proxy_read_timeout 900s;
|
|
}
|
|
```
|
|
|
|
## Benchmarks
|
|
```bash
|
|
# Generate embeddings
|
|
uv run python3 benchmarks/generate_embeddings.py
|
|
|
|
# Run benchmarks
|
|
uv run pytest benchmarks/test_bench_cosine_sim.py --benchmark-json=benchmarks/genfiles/benchmark_results.json
|
|
|
|
# Visualize results
|
|
uv run python3 benchmarks/visualize_benchmarks.py benchmarks/genfiles/benchmark_results.json
|
|
```
|