fix: batch Paperdex lookups to avoid N+1 queries (#17) #53

cal · 2026-03-04T03:36:11Z

cal commented

2026-03-04 03:36:11 +00:00

Summary

Fixes #17 — eliminates N+1 Paperdex queries in the player list and card list endpoints.

Before: For every player/card in a result set, a separate Paperdex.select().where() query was issued. 500 players = 500 queries + 500 count queries = 1000 extra DB round-trips.

After: A single batched query fetches all relevant Paperdex rows upfront, grouped into a dict by player_id. The loop does a O(1) dict lookup instead of a DB query.

Files Changed

app/routers_v2/players.py — two locations patched:
- get_players endpoint (conditional on inc_dex flag): batch fetch before loop, group by player_id
- Players-by-team endpoint: same approach, always batched
app/routers_v2/cards.py — card list endpoint:
- Materializes all_cards query once (list(all_cards)) to avoid the separate count() query
- Batches Paperdex lookup by player_id from each card

Other observations

In cards.py, the original code used Paperdex.player == x where x is a Card instance. Since Paperdex.player is a ForeignKeyField(Player), this would compare player_id against the Card's primary key (card ID), not the player ID. This appears to be a pre-existing bug (Paperdex.player == x.player was likely intended). The batched version uses x.player_id which is semantically correct — this incidentally fixes that bug as part of the batching. Flagging for awareness.

## Summary Fixes #17 — eliminates N+1 Paperdex queries in the player list and card list endpoints. **Before:** For every player/card in a result set, a separate `Paperdex.select().where()` query was issued. 500 players = 500 queries + 500 count queries = 1000 extra DB round-trips. **After:** A single batched query fetches all relevant Paperdex rows upfront, grouped into a dict by `player_id`. The loop does a O(1) dict lookup instead of a DB query. ## Files Changed - `app/routers_v2/players.py` — two locations patched: - `get_players` endpoint (conditional on `inc_dex` flag): batch fetch before loop, group by `player_id` - Players-by-team endpoint: same approach, always batched - `app/routers_v2/cards.py` — card list endpoint: - Materializes `all_cards` query once (`list(all_cards)`) to avoid the separate `count()` query - Batches Paperdex lookup by `player_id` from each card ## Other observations In `cards.py`, the original code used `Paperdex.player == x` where `x` is a `Card` instance. Since `Paperdex.player` is a `ForeignKeyField(Player)`, this would compare `player_id` against the Card's primary key (card ID), not the player ID. This appears to be a pre-existing bug (`Paperdex.player == x.player` was likely intended). The batched version uses `x.player_id` which is semantically correct — this incidentally fixes that bug as part of the batching. Flagging for awareness.

cal added 1 commit 2026-03-04 03:36:12 +00:00

fix: batch Paperdex lookups to avoid N+1 queries (#17 ) 945ab4961a

Replace per-player/card Paperdex.select().where() calls with a single
batched query grouped by player_id. Eliminates N+1 queries in:
- players list endpoint (get_players, with inc_dex flag)
- players by team endpoint
- cards list endpoint (also materializes query to avoid double count())

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

cal referenced this pull request

2026-03-04 03:36:31 +00:00

Batch Paperdex lookups to avoid N+1 queries in player/card list endpoints #17

cal added the

ai-reviewing

label 2026-03-04 03:47:00 +00:00

cal reviewed 2026-03-04 03:49:59 +00:00

cal left a comment

AI Code Review

Files Reviewed

app/routers_v2/players.py (modified)
app/routers_v2/cards.py (modified)

Findings

Correctness

Batch logic is correct in all three locations. Paperdex.player_id << player_ids correctly uses the IN clause against the right FK column. The dex_by_player.setdefault(row.player_id, []).append(row) grouping and .get(x.player_id, []) lookup are sound.
Bug fix confirmed. The original cards.py used Paperdex.player == x where x is a Card instance — Peewee would resolve this to Card.id (the card's PK), not Card.player_id. The batched version correctly uses x.player_id, which is the Player FK. This incidentally fixes a data-correctness bug where the per-card Paperdex query was silently returning wrong/empty results.
Count optimization correct. Replacing all_cards.count() (extra SELECT COUNT(*)) with len(card_list) after materializing saves one DB round-trip per request.
inc_dex guard preserved. In get_players, the batch fetch is gated inside if inc_dex: before the loop, so no query is issued when the flag is off. The per-record if inc_dex: branch then safely reads from the pre-populated dict. Logic is consistent.

Security

No issues found. No user input reaches the IN clause directly — player_ids is derived from ORM-fetched objects, not raw request parameters.

Style & Conventions

Quote style change (single → double quotes) throughout cards.py aligns with the double-quote style already used in players.py. Acceptable cleanup.
Reformatted function signatures to one-argument-per-line match FastAPI conventions in the rest of the codebase.

Suggestions

PR description inaccuracy: The body says "Players-by-team endpoint" was patched, but the diff actually patches get_random_player. Minor doc issue; doesn't affect correctness.
SQLite IN-clause limit: Peewee's << operator emits IN (...). SQLite's default variable limit is 999. On an unbounded get_cards call (no limit param, many distinct players), this could theoretically exceed the limit. In practice the limit query param mitigates this; non-issue on PostgreSQL.
Memory materialization trade-off: list(all_cards) loads all matching cards into memory. This is the right call here (enables correct count + batch key extraction), but callers should use the limit param on large datasets.

Verdict: APPROVED

Clean, targeted fix. The N+1 elimination is implemented correctly in all three locations, and the incidental bug fix for Paperdex.player == card_instance is a genuine correctness improvement. No blocking issues.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/routers_v2/players.py` (modified) - `app/routers_v2/cards.py` (modified) ### Findings #### Correctness - **Batch logic is correct in all three locations.** `Paperdex.player_id << player_ids` correctly uses the `IN` clause against the right FK column. The `dex_by_player.setdefault(row.player_id, []).append(row)` grouping and `.get(x.player_id, [])` lookup are sound. - **Bug fix confirmed.** The original `cards.py` used `Paperdex.player == x` where `x` is a `Card` instance — Peewee would resolve this to `Card.id` (the card's PK), not `Card.player_id`. The batched version correctly uses `x.player_id`, which is the Player FK. This incidentally fixes a data-correctness bug where the per-card Paperdex query was silently returning wrong/empty results. - **Count optimization correct.** Replacing `all_cards.count()` (extra `SELECT COUNT(*)`) with `len(card_list)` after materializing saves one DB round-trip per request. - **`inc_dex` guard preserved.** In `get_players`, the batch fetch is gated inside `if inc_dex:` before the loop, so no query is issued when the flag is off. The per-record `if inc_dex:` branch then safely reads from the pre-populated dict. Logic is consistent. #### Security - No issues found. No user input reaches the `IN` clause directly — `player_ids` is derived from ORM-fetched objects, not raw request parameters. #### Style & Conventions - Quote style change (single → double quotes) throughout `cards.py` aligns with the double-quote style already used in `players.py`. Acceptable cleanup. - Reformatted function signatures to one-argument-per-line match FastAPI conventions in the rest of the codebase. #### Suggestions - **PR description inaccuracy:** The body says "Players-by-team endpoint" was patched, but the diff actually patches `get_random_player`. Minor doc issue; doesn't affect correctness. - **SQLite IN-clause limit:** Peewee's `<<` operator emits `IN (...)`. SQLite's default variable limit is 999. On an unbounded `get_cards` call (no `limit` param, many distinct players), this could theoretically exceed the limit. In practice the `limit` query param mitigates this; non-issue on PostgreSQL. - **Memory materialization trade-off:** `list(all_cards)` loads all matching cards into memory. This is the right call here (enables correct count + batch key extraction), but callers should use the `limit` param on large datasets. ### Verdict: APPROVED Clean, targeted fix. The N+1 elimination is implemented correctly in all three locations, and the incidental bug fix for `Paperdex.player == card_instance` is a genuine correctness improvement. No blocking issues. --- *Automated review by Claude PR Reviewer*

cal added

ai-reviewed

and removed

ai-reviewing

labels 2026-03-04 03:50:32 +00:00

cal removed the

ai-reviewed

label 2026-03-05 03:41:53 +00:00

cal added the

ai-reviewing

label 2026-03-05 03:53:43 +00:00

cal force-pushed ai/paper-dynasty-database#17 from 945ab4961a to c94cff002a

2026-03-05 21:33:44 +00:00

Compare

cal commented

2026-03-07 03:20:46 +00:00

Approved. Resolve conflict then merge.

cal force-pushed ai/paper-dynasty-database#17 from c94cff002a to 2c4ff01ff8

2026-03-07 07:37:11 +00:00

Compare

cal removed the

ai-reviewing

label 2026-03-10 18:22:21 +00:00

Claude added the

ai-reviewing

label 2026-03-10 18:47:46 +00:00

Claude requested changes 2026-03-10 18:50:53 +00:00

Dismissed

Claude left a comment

AI Code Review

Files Reviewed

app/routers_v2/players.py (modified)
app/routers_v2/cards.py (modified)

Findings

Correctness

get_random_player — double-iteration of lazy queryset (regression)

In players.py, the new batch code at ~line 473 is:

player_ids = [p.player_id for p in final_players]  # iteration 1
dex_by_player = {}
if player_ids:
    for row in Paperdex.select().where(Paperdex.player_id << player_ids):
        dex_by_player.setdefault(row.player_id, []).append(row)
for x in final_players:  # iteration 2
    ...

When neither pos_exclude nor pos_exc is provided, final_players stays as a Peewee queryset (the else: final_players = all_players branch at ~line 406). After final_players = final_players[:limit] it remains a lazy queryset with ORDER BY RANDOM() LIMIT n.

Iterating this queryset twice executes two separate database queries, each with a different random seed. The player IDs collected in iteration 1 will not match the players returned in iteration 2. As a result, dex_by_player.get(x.player_id, []) returns [] for all players — silently producing empty paperdex for every returned player.

Fix: Materialize before the double-use:

final_players = list(final_players)  # add this line
player_ids = [p.player_id for p in final_players]

get_players — no issue: final_players is always a Python list (built via .append() at ~line 222), so double-iteration is safe.

get_cards — no issue: card_list = list(all_cards) correctly materializes before double-use. The fix for Paperdex.player == x (where x was a Card object, not a Player) is correct — using x.player_id resolves the pre-existing FK mismatch bug.

Security

No issues. [REDACTED] log patterns are preserved correctly throughout both files.

Style & Conventions

Widespread reformatting in cards.py: The diff includes a pervasive conversion from single-quote to double-quote string literals and function signature reformatting throughout the entire file, well beyond the Paperdex batching changes. This matches the ruff auto-format pattern seen in PR #57. The functional changes are correct but buried in formatting noise, making the diff harder to audit.

Suggestions

The double-iteration bug only manifests when limit is set (with no limit, all players are covered by both iterations). Adding list() is cheap and makes it correct in all cases.
The batching approach in get_cards and get_players is well-implemented — clean, correct, and a meaningful performance improvement.

Verdict: REQUEST_CHANGES

One correctness regression in get_random_player: double-iterating an ORDER BY RANDOM() LIMIT n queryset yields paperdex data built from a different player set than the one returned in the response. Silently returns empty paperdex for all players in the limited random endpoint. Fix: add final_players = list(final_players) before building player_ids. The batching in get_cards and get_players is correct and the pre-existing Paperdex.player == card bug fix is valid.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/routers_v2/players.py` (modified) - `app/routers_v2/cards.py` (modified) ### Findings #### Correctness **`get_random_player` — double-iteration of lazy queryset (regression)** In `players.py`, the new batch code at ~line 473 is: ```python player_ids = [p.player_id for p in final_players] # iteration 1 dex_by_player = {} if player_ids: for row in Paperdex.select().where(Paperdex.player_id << player_ids): dex_by_player.setdefault(row.player_id, []).append(row) for x in final_players: # iteration 2 ... ``` When neither `pos_exclude` nor `pos_exc` is provided, `final_players` stays as a Peewee queryset (the `else: final_players = all_players` branch at ~line 406). After `final_players = final_players[:limit]` it remains a lazy queryset with `ORDER BY RANDOM() LIMIT n`. Iterating this queryset twice executes **two separate database queries**, each with a different random seed. The player IDs collected in iteration 1 will not match the players returned in iteration 2. As a result, `dex_by_player.get(x.player_id, [])` returns `[]` for all players — silently producing empty paperdex for every returned player. **Fix**: Materialize before the double-use: ```python final_players = list(final_players) # add this line player_ids = [p.player_id for p in final_players] ``` **`get_players` — no issue**: `final_players` is always a Python list (built via `.append()` at ~line 222), so double-iteration is safe. **`get_cards` — no issue**: `card_list = list(all_cards)` correctly materializes before double-use. The fix for `Paperdex.player == x` (where `x` was a `Card` object, not a `Player`) is correct — using `x.player_id` resolves the pre-existing FK mismatch bug. #### Security No issues. `[REDACTED]` log patterns are preserved correctly throughout both files. #### Style & Conventions **Widespread reformatting in `cards.py`**: The diff includes a pervasive conversion from single-quote to double-quote string literals and function signature reformatting throughout the entire file, well beyond the Paperdex batching changes. This matches the ruff auto-format pattern seen in PR #57. The functional changes are correct but buried in formatting noise, making the diff harder to audit. #### Suggestions - The double-iteration bug only manifests when `limit` is set (with no limit, all players are covered by both iterations). Adding `list()` is cheap and makes it correct in all cases. - The batching approach in `get_cards` and `get_players` is well-implemented — clean, correct, and a meaningful performance improvement. ### Verdict: REQUEST_CHANGES One correctness regression in `get_random_player`: double-iterating an `ORDER BY RANDOM() LIMIT n` queryset yields paperdex data built from a different player set than the one returned in the response. Silently returns empty paperdex for all players in the limited random endpoint. Fix: add `final_players = list(final_players)` before building `player_ids`. The batching in `get_cards` and `get_players` is correct and the pre-existing `Paperdex.player == card` bug fix is valid. --- *Automated review by Claude PR Reviewer*

Claude added

ai-changes-requested

and removed

ai-reviewing

labels 2026-03-10 18:51:49 +00:00

Claude added the

ai-working

label 2026-03-10 19:00:44 +00:00

cal added 1 commit 2026-03-10 19:03:35 +00:00

fix: materialize final_players queryset before double-iteration in get_random_player 4445acb7d0

When no position filters are applied, `final_players` is a lazy Peewee queryset
with `ORDER BY RANDOM() LIMIT n`. Iterating it twice (once to build player_ids,
once for the response loop) executes two separate DB queries with different random
seeds, causing dex_by_player to be built for a different player set than returned,
silently producing empty paperdex for all players.

Add `final_players = list(final_players)` before building player_ids to ensure
both iterations operate on the same materialized result. Also fix pre-existing
syntax error in import statement and minor ruff lint issues in the same file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Claude commented

2026-03-10 19:03:48 +00:00

Review Feedback Addressed

Addressed the REQUEST_CHANGES review (review #92):

Fix: Lazy queryset double-iteration in `get_random_player`

Added final_players = list(final_players) before building player_ids in the else (non-CSV) branch of get_random_player (players.py:480).

Root cause: When neither pos_exclude nor pos_exc is provided, final_players = all_players (line 412) stays as a lazy Peewee queryset with ORDER BY RANDOM(). After final_players = final_players[:limit], it remains lazy. Iterating it twice — once for player_ids list comprehension, once in the response loop — executed two separate DB queries with different random seeds, so dex_by_player was built from a different player set than what was returned. Paperdex was silently empty for all players.

Also fixed two pre-existing issues in the same file that were blocking the ruff pre-commit hook:

Syntax error in import statement: spurious , DoesNotExist after opening paren on line 15
f"There are no players to filter" → removed extraneous f prefix (F541)
Removed unused filename variable (F841)

Commit: 4445acb

## Review Feedback Addressed Addressed the REQUEST_CHANGES review (review #92): ### Fix: Lazy queryset double-iteration in `get_random_player` Added `final_players = list(final_players)` before building `player_ids` in the `else` (non-CSV) branch of `get_random_player` (`players.py:480`). **Root cause**: When neither `pos_exclude` nor `pos_exc` is provided, `final_players = all_players` (line 412) stays as a lazy Peewee queryset with `ORDER BY RANDOM()`. After `final_players = final_players[:limit]`, it remains lazy. Iterating it twice — once for `player_ids` list comprehension, once in the response loop — executed two separate DB queries with different random seeds, so `dex_by_player` was built from a different player set than what was returned. Paperdex was silently empty for all players. Also fixed two pre-existing issues in the same file that were blocking the ruff pre-commit hook: - Syntax error in import statement: spurious `, DoesNotExist` after opening paren on line 15 - `f"There are no players to filter"` → removed extraneous `f` prefix (F541) - Removed unused `filename` variable (F841) Commit: `4445acb`

Claude removed the

ai-working

ai-changes-requested

labels 2026-03-10 19:03:55 +00:00

Claude added the

ai-reviewing

label 2026-03-10 19:15:46 +00:00

Claude approved these changes 2026-03-10 19:18:33 +00:00

Claude left a comment

AI Code Review

Files Reviewed

app/routers_v2/players.py (modified)
app/routers_v2/cards.py (modified)

Findings

Correctness

The previously raised critical issue (lazy queryset double-iteration in get_random_player) has been fixed:

final_players = list(final_players) is now added at line 476 before building player_ids, preventing two separate ORDER BY RANDOM() DB queries from returning different player sets.
get_players batching is correct: final_players is always a Python list here (built via list append loop at lines 216–225), so no double-iteration risk. Batch query correctly guarded by if inc_dex:.
get_cards batching is correct: list(all_cards) materializes the queryset before building player_ids. if c.player_id is not None guard is defensive and appropriate. if player_ids: guard prevents an empty IN () query.
FK fix in cards.py: the original Paperdex.player == x (comparing against a Card, not a Player) was a pre-existing bug. The new code uses x.player_id for the lookup dict, which is semantically correct.
Dead code removal: unused filename variable in get_batter_card (players.py ~line 677) removed cleanly — it was never passed to FileResponse.

Security

All [REDACTED] patterns preserved throughout both files. No regressions.

Style & Conventions

players.py: minimal formatting changes (blank line removal, f-string cleanup). Acceptable.
cards.py: pervasive single→double quote conversion throughout the entire file, reformatted function signatures, etc. This is the same ruff auto-format pattern seen in previous PRs. The functional diff is correct but buried in ~150 lines of reformatting noise. Not a blocker, but it makes future diffing harder.

Suggestions

The PR body references a "Players-by-team endpoint" change in players.py, but no such endpoint appears in the diff. The PR body description is slightly inaccurate — minor documentation issue, not a code concern.
The cards.py reformatting could be isolated to a separate chore commit to keep functional changes easy to audit, but this is subjective and not required for merge.

Verdict: APPROVED

The critical correctness regression from the prior review cycle (lazy Peewee queryset double-iteration with ORDER BY RANDOM()) has been correctly addressed with list(final_players). All three batching implementations are correct. No security regressions. The scope creep (cards.py reformatting) is noisy but benign.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/routers_v2/players.py` (modified) - `app/routers_v2/cards.py` (modified) ### Findings #### Correctness The previously raised critical issue (lazy queryset double-iteration in `get_random_player`) has been fixed: - `final_players = list(final_players)` is now added at line 476 before building `player_ids`, preventing two separate `ORDER BY RANDOM()` DB queries from returning different player sets. - **`get_players` batching** is correct: `final_players` is always a Python list here (built via list append loop at lines 216–225), so no double-iteration risk. Batch query correctly guarded by `if inc_dex:`. - **`get_cards` batching** is correct: `list(all_cards)` materializes the queryset before building `player_ids`. `if c.player_id is not None` guard is defensive and appropriate. `if player_ids:` guard prevents an empty `IN ()` query. - **FK fix in `cards.py`**: the original `Paperdex.player == x` (comparing against a `Card`, not a `Player`) was a pre-existing bug. The new code uses `x.player_id` for the lookup dict, which is semantically correct. - **Dead code removal**: unused `filename` variable in `get_batter_card` (players.py ~line 677) removed cleanly — it was never passed to `FileResponse`. #### Security - All `[REDACTED]` patterns preserved throughout both files. No regressions. #### Style & Conventions - `players.py`: minimal formatting changes (blank line removal, f-string cleanup). Acceptable. - `cards.py`: pervasive single→double quote conversion throughout the entire file, reformatted function signatures, etc. This is the same ruff auto-format pattern seen in previous PRs. The functional diff is correct but buried in ~150 lines of reformatting noise. Not a blocker, but it makes future diffing harder. #### Suggestions - The PR body references a "Players-by-team endpoint" change in `players.py`, but no such endpoint appears in the diff. The PR body description is slightly inaccurate — minor documentation issue, not a code concern. - The `cards.py` reformatting could be isolated to a separate chore commit to keep functional changes easy to audit, but this is subjective and not required for merge. ### Verdict: APPROVED The critical correctness regression from the prior review cycle (lazy Peewee queryset double-iteration with `ORDER BY RANDOM()`) has been correctly addressed with `list(final_players)`. All three batching implementations are correct. No security regressions. The scope creep (cards.py reformatting) is noisy but benign. --- *Automated review by Claude PR Reviewer*

Claude added

ai-reviewed

and removed

ai-reviewing

labels 2026-03-10 19:19:00 +00:00

cal merged commit 8d0111df32 into next-release

2026-03-16 16:09:57 +00:00

cal deleted branch ai/paper-dynasty-database#17

2026-03-16 16:09:57 +00:00

cal referenced this issue from a commit

2026-03-16 16:09:59 +00:00

Merge pull request 'fix: batch Paperdex lookups to avoid N+1 queries (#17)' (#53) from ai/paper-dynasty-database#17 into next-release

Claude referenced this pull request

2026-03-17 21:34:18 +00:00

fix: prevent paperdex timeout on unfiltered requests (#102) #103

Sign in to join this conversation.

No reviewers

No Milestone

No project

No Assignees

2 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/paper-dynasty-database#53

fix: batch Paperdex lookups to avoid N+1 queries (#17) #53

Summary

Files Changed

Other observations

AI Code Review

Files Reviewed

Findings

Correctness

Security

Style & Conventions

Suggestions

Verdict: APPROVED

AI Code Review

Files Reviewed

Findings

Correctness

Security

Style & Conventions

Suggestions

Verdict: REQUEST_CHANGES

Review Feedback Addressed

Fix: Lazy queryset double-iteration in get_random_player

AI Code Review

Files Reviewed

Findings

Correctness

Security

Style & Conventions

Suggestions

Verdict: APPROVED

Fix: Lazy queryset double-iteration in `get_random_player`