fix: prevent paperdex timeout on unfiltered requests (#102) #103

Merged
Claude merged 2 commits from ai/paper-dynasty-database#102 into next-release 2026-03-23 03:59:48 +00:00
Collaborator

Closes #102

Summary

GET /api/v2/paperdex timed out on unfiltered (or lightly filtered) requests due to two unbounded operations in get_paperdex:

  1. Pre-filter COUNT(*): all_dex.count() was called on the full table before any WHERE filters were applied — a full-table scan on every request.
  2. Second COUNT(*): all_dex.count() was called again in the response body, then the full queryset was iterated separately — two queries for what should be one.
  3. No result cap: No LIMIT clause meant the serialization loop (model_to_dict(x, recurse=True)) could run on the entire table.

Changes (app/routers_v2/paperdex.py)

  • Removed pre-filter all_dex.count() == 0 check (and its stale commented-out sibling)
  • Added limit: Optional[int] = 500 query parameter — defaults to 500, applied after all filters
  • Replaced all_dex.count() in the response with items = list(all_dex) + len(items) — single query, no double-count
  • Removed unused all_sets variable (was blocking the ruff pre-commit hook)

Other observations

  • The ruff LSP formatter auto-applied quote normalization (single → double) and function signature reformatting to the entire file — same pattern noted in PRs #53 and #57. Functional changes are isolated to get_paperdex.
  • Pyright reports attribute-access errors on Peewee ORM fields (team_id, player_id, id) — these are pre-existing, not introduced by this PR.
Closes #102 ## Summary `GET /api/v2/paperdex` timed out on unfiltered (or lightly filtered) requests due to two unbounded operations in `get_paperdex`: 1. **Pre-filter `COUNT(*)`**: `all_dex.count()` was called on the full table _before_ any `WHERE` filters were applied — a full-table scan on every request. 2. **Second `COUNT(*)`**: `all_dex.count()` was called again in the response body, then the full queryset was iterated separately — two queries for what should be one. 3. **No result cap**: No `LIMIT` clause meant the serialization loop (`model_to_dict(x, recurse=True)`) could run on the entire table. ## Changes (`app/routers_v2/paperdex.py`) - Removed pre-filter `all_dex.count() == 0` check (and its stale commented-out sibling) - Added `limit: Optional[int] = 500` query parameter — defaults to 500, applied after all filters - Replaced `all_dex.count()` in the response with `items = list(all_dex)` + `len(items)` — single query, no double-count - Removed unused `all_sets` variable (was blocking the ruff pre-commit hook) ## Other observations - The ruff LSP formatter auto-applied quote normalization (single → double) and function signature reformatting to the entire file — same pattern noted in PRs #53 and #57. Functional changes are isolated to `get_paperdex`. - Pyright reports attribute-access errors on Peewee ORM fields (`team_id`, `player_id`, `id`) — these are pre-existing, not introduced by this PR.
Claude added 1 commit 2026-03-17 21:34:21 +00:00
- Remove premature all_dex.count() before filters are applied (was a
  full-table COUNT(*) before any WHERE clauses)
- Add limit parameter (default 500) to cap unbounded result sets
- Materialize queryset into list once and use len() for count, avoiding
  a second COUNT(*) query
- Remove unused all_sets variable (blocked pre-commit ruff check)

Closes #102

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude added the
ai-reviewing
label 2026-03-17 21:45:50 +00:00
Claude reviewed 2026-03-17 21:47:10 +00:00
Claude left a comment
Author
Collaborator

AI Code Review

Files Reviewed

  • app/routers_v2/paperdex.py (modified)

Findings

Correctness

  • Pre-filter count() removal (line 31): Correct. The old all_dex.count() == 0 check ran a full-table COUNT(*) before any WHERE clauses on every request. Removing it eliminates the scan. The behavior change (empty result returns {"count": 0, "paperdex": []} instead of 404) is better REST semantics.
  • items = list(all_dex) + len(items) (lines 62–63): Correct. Materializes the queryset exactly once; len() on a Python list is O(1) with no second DB query. Fixes the double-COUNT(*) described in the PR body.
  • all_sets removal (cardset branch): Correct — all_sets was assigned but never referenced. The actual filter Paperdex.player.cardset.id == cardset_id is unchanged and correct.
  • limit applied before if csv: branch (lines 48–49): The LIMIT is applied to all_dex before the CSV path executes, so CSV exports are also capped at 500 rows by default. This is a behavior change for any CSV consumers who previously relied on unbounded exports. Callers must now pass ?limit=N explicitly for full exports. Not a bug, but worth noting.

Security

  • logging.warning("Bad Token: [REDACTED]") preserved correctly across all three auth-gated endpoints — no token leakage.
  • Peewee ORM parameterizes all queries — no injection risk.
  • No credentials or PII exposed.

Style & Conventions

  • Quote normalization (single → double) and function signature reformatting throughout the file are ruff auto-format artefacts — same pattern as PRs #53 and #57. Functional changes are isolated to get_paperdex. Acceptable.
  • raise HTTPException(status_code=200, ...) in delete_paperdex (line 182–184) is a pre-existing bug (should be return), not introduced by this PR — out of scope.

Suggestions

  • CSV limit documentation: Consider adding a note to the OpenAPI description that both JSON and CSV responses are limited to limit rows (default 500). Callers may not discover this until they see truncated CSV exports.
  • limit=0 edge case: ?limit=0 issues LIMIT 0 to PostgreSQL and returns an empty result. If that's semantically meaningless, a ge=1 constraint (limit: Optional[int] = Query(500, ge=1)) would guard it. Minor suggestion only.

Verdict: APPROVED

Core fix is correct and well-scoped. The two performance issues (pre-filter full-table COUNT(*) and double-query pattern) are both addressed accurately. The new limit parameter defaults sensibly and is applied in the right place. Formatting changes are cosmetic and consistent with established project patterns.


Automated review by Claude PR Reviewer (posted as COMMENT due to Gitea self-review restriction — verdict is APPROVED)

## AI Code Review ### Files Reviewed - `app/routers_v2/paperdex.py` (modified) ### Findings #### Correctness - **Pre-filter `count()` removal** (line 31): Correct. The old `all_dex.count() == 0` check ran a full-table `COUNT(*)` before any `WHERE` clauses on every request. Removing it eliminates the scan. The behavior change (empty result returns `{"count": 0, "paperdex": []}` instead of 404) is better REST semantics. - **`items = list(all_dex)` + `len(items)`** (lines 62–63): Correct. Materializes the queryset exactly once; `len()` on a Python list is O(1) with no second DB query. Fixes the double-`COUNT(*)` described in the PR body. - **`all_sets` removal** (cardset branch): Correct — `all_sets` was assigned but never referenced. The actual filter `Paperdex.player.cardset.id == cardset_id` is unchanged and correct. - **`limit` applied before `if csv:` branch** (lines 48–49): The LIMIT is applied to `all_dex` before the CSV path executes, so CSV exports are also capped at 500 rows by default. This is a behavior change for any CSV consumers who previously relied on unbounded exports. Callers must now pass `?limit=N` explicitly for full exports. Not a bug, but worth noting. #### Security - `logging.warning("Bad Token: [REDACTED]")` preserved correctly across all three auth-gated endpoints — no token leakage. - Peewee ORM parameterizes all queries — no injection risk. - No credentials or PII exposed. #### Style & Conventions - Quote normalization (single → double) and function signature reformatting throughout the file are ruff auto-format artefacts — same pattern as PRs #53 and #57. Functional changes are isolated to `get_paperdex`. Acceptable. - `raise HTTPException(status_code=200, ...)` in `delete_paperdex` (line 182–184) is a pre-existing bug (should be `return`), not introduced by this PR — out of scope. #### Suggestions - **CSV limit documentation**: Consider adding a note to the OpenAPI description that both JSON and CSV responses are limited to `limit` rows (default 500). Callers may not discover this until they see truncated CSV exports. - **`limit=0` edge case**: `?limit=0` issues `LIMIT 0` to PostgreSQL and returns an empty result. If that's semantically meaningless, a `ge=1` constraint (`limit: Optional[int] = Query(500, ge=1)`) would guard it. Minor suggestion only. ### Verdict: APPROVED Core fix is correct and well-scoped. The two performance issues (pre-filter full-table `COUNT(*)` and double-query pattern) are both addressed accurately. The new `limit` parameter defaults sensibly and is applied in the right place. Formatting changes are cosmetic and consistent with established project patterns. --- *Automated review by Claude PR Reviewer (posted as COMMENT due to Gitea self-review restriction — verdict is APPROVED)*
Claude added
ai-reviewed
and removed
ai-reviewing
labels 2026-03-17 21:47:38 +00:00
cal approved these changes 2026-03-23 03:59:31 +00:00
cal left a comment
Owner

Approved per PO triage.

Approved per PO triage.
Claude added 1 commit 2026-03-23 03:59:43 +00:00
Claude merged commit b9faf278f6 into next-release 2026-03-23 03:59:48 +00:00
Claude deleted branch ai/paper-dynasty-database#102 2026-03-23 03:59:48 +00:00
Sign in to join this conversation.
No description provided.