API allows unbounded queries causing Gunicorn worker timeouts #98

Closed
opened 2026-04-01 21:46:04 +00:00 by cal · 0 comments
Owner

Problem

Gunicorn workers are repeatedly timing out (120s) and getting SIGABRT'd — 12 occurrences in the last 2 days. The monitoring stack detects the unhealthy window during worker recycling and restarts the container.

Root Cause

Requests arriving with limit=99999 or empty required filter parameters cause the API to attempt returning enormous result sets, tying up workers past the timeout threshold.

Source: External clients via reverse proxy

UPDATED: The source IP 172.25.0.3 is nginx-proxy-manager, NOT the Discord bot. These queries come from external clients (website users or direct API callers) through the reverse proxy. The Discord bot (172.18.0.2) is on a separate Docker network and does not generate these queries.

Observed bad requests (from container logs)

Request Issue
GET /plays/?limit=99999&game_id=&short_output=FALSE → 500 Empty game_id = fetch ALL plays
GET /plays/?game_id=&pitcher_id=&sort=oldest → 500 Empty filters = entire plays table
GET /players?season=9&team_id=&team_id=391 → 500 Duplicate/empty team_id params
GET /plays/batting?limit=99999&... 99,999 row limit on batting plays
GET /transactions?limit=99999&... 99,999 row limit on transactions

Current State of the API

  • Only /players/search has a max limit cap (le=50). Every other endpoint trusts the client.
  • Empty string params silently pass validation — FastAPI parses game_id= as [''], which passes if param is not None checks but generates wasteful queries.
  • /transactions has no limit parameter at all — always returns all matching rows with recursive serialization.
  • model_to_dict(recurse=True) amplifies the problem — each row triggers additional DB lookups for FK relations.

Proposed Fix

Priority 1 — Immediate (stop the crashes)

  1. Add a global MAX_LIMIT constant (e.g., 500) and enforce le=500 across all list endpoints
  2. Strip empty strings from list params — convert [''] to None via a shared dependency
  3. Add a limit param to /transactions with default=200, max=500

Priority 2 — Short-term hardening

  1. Add PostgreSQL statement_timeout = 30s as a safety net below the Gunicorn timeout
  2. Force short_output=True when limit > 100 to prevent recursive serialization on large result sets
  3. Log when clients request limit > 500 to identify problematic callers

Priority 3 — Architectural

  1. Standardize pagination with a shared PaginationParams dependency across all list endpoints

Impact

  • Workers timeout and get killed every few hours
  • Brief API unavailability during worker respawn — can affect any client, including transaction submissions
  • Monitoring flags container as unhealthy, triggers unnecessary restarts
  • P1 priority — active production instability during Season 12
## Problem Gunicorn workers are repeatedly timing out (120s) and getting SIGABRT'd — 12 occurrences in the last 2 days. The monitoring stack detects the unhealthy window during worker recycling and restarts the container. ## Root Cause Requests arriving with `limit=99999` or empty required filter parameters cause the API to attempt returning enormous result sets, tying up workers past the timeout threshold. ### Source: External clients via reverse proxy **UPDATED:** The source IP `172.25.0.3` is **nginx-proxy-manager**, NOT the Discord bot. These queries come from external clients (website users or direct API callers) through the reverse proxy. The Discord bot (`172.18.0.2`) is on a separate Docker network and does not generate these queries. ### Observed bad requests (from container logs) | Request | Issue | |---|---| | `GET /plays/?limit=99999&game_id=&short_output=FALSE` → 500 | Empty `game_id` = fetch ALL plays | | `GET /plays/?game_id=&pitcher_id=&sort=oldest` → 500 | Empty filters = entire plays table | | `GET /players?season=9&team_id=&team_id=391` → 500 | Duplicate/empty `team_id` params | | `GET /plays/batting?limit=99999&...` | 99,999 row limit on batting plays | | `GET /transactions?limit=99999&...` | 99,999 row limit on transactions | ## Current State of the API - **Only `/players/search` has a max limit cap** (`le=50`). Every other endpoint trusts the client. - **Empty string params silently pass validation** — FastAPI parses `game_id=` as `['']`, which passes `if param is not None` checks but generates wasteful queries. - **`/transactions` has no limit parameter at all** — always returns all matching rows with recursive serialization. - **`model_to_dict(recurse=True)`** amplifies the problem — each row triggers additional DB lookups for FK relations. ## Proposed Fix ### Priority 1 — Immediate (stop the crashes) 1. **Add a global `MAX_LIMIT` constant** (e.g., 500) and enforce `le=500` across all list endpoints 2. **Strip empty strings from list params** — convert `['']` to `None` via a shared dependency 3. **Add a `limit` param to `/transactions`** with default=200, max=500 ### Priority 2 — Short-term hardening 4. **Add PostgreSQL `statement_timeout = 30s`** as a safety net below the Gunicorn timeout 5. **Force `short_output=True` when limit > 100** to prevent recursive serialization on large result sets 6. **Log when clients request limit > 500** to identify problematic callers ### Priority 3 — Architectural 7. **Standardize pagination** with a shared `PaginationParams` dependency across all list endpoints ## Impact - Workers timeout and get killed every few hours - Brief API unavailability during worker respawn — can affect any client, including transaction submissions - Monitoring flags container as unhealthy, triggers unnecessary restarts - **P1 priority** — active production instability during Season 12
cal added the
ai-working
label 2026-04-01 21:55:40 +00:00
cal closed this issue 2026-04-01 22:44:41 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/major-domo-database#98
No description provided.