feat(dev): add rollback endpoint for synthetic test games #215

Open
cal wants to merge 3 commits from feat/dev-cleanup-endpoint into main
Owner

Adds a dev-only DELETE /api/v2/dev/test-games/{game_id} endpoint that fully reverses all side effects of a synthetic test game created by the Discord /dev refractor-test command.

Design spec: docs/superpowers/specs/2026-04-11-dev-cleanup-endpoint-design.md
Implementation plan: docs/superpowers/plans/2026-04-11-dev-cleanup-endpoint.md

Why

The /dev refractor-test command (discord#161) exercises the full refractor evaluation pipeline by creating synthetic games and plays. Its CleanupView button only deletes the game/plays/decisions — it leaves bloated season stats, orphaned variant cards, and permanently tier-upped refractor state (because evaluate_card enforces current_tier = max(current_tier, new_tier)). Manual SQL cleanup is currently required to re-run a test on the same card. This endpoint automates the full rollback.

What's included

  • app/routers_v2/dev.py — router + require_dev_env() dependency that gates on POSTGRES_DB=paperdynasty_dev at request time (not import time, so prod containers still refuse)
  • app/services/dev_rollback.py — 14-step rollback service, transactional
  • tests/test_dev_rollback.py — 9 unit tests (service)
  • tests/test_dev_router.py — 6 router-level tests (HTTP contract)
  • app/main.py — register dev.router
  • tests/conftest.py — add BattingCardRatings/PitchingCardRatings to _TEST_MODELS (missing; variant card deletion requires them)

No schema changes, no migrations.

Safety

  • 403 when POSTGRES_DB != paperdynasty_dev (prod refuses)
  • 404 when stratgame row missing
  • 409 when processed_at is >10 minutes old (safety bail for stale games)
  • Full transaction; any failure rolls back atomically

Test summary

  • 9 service unit tests — all green
  • 6 router HTTP tests — all green
  • Full suite: 13 pre-existing failures unchanged, 0 new failures introduced

Follow-up

Discord cog consumer (cleanup button calls this endpoint instead of manual SQL) is a separate PR in the discord-app repo.

DO NOT DEPLOY — Cal will deploy separately after merge while manual testing is active on the dev API.

Adds a dev-only `DELETE /api/v2/dev/test-games/{game_id}` endpoint that fully reverses all side effects of a synthetic test game created by the Discord `/dev refractor-test` command. **Design spec:** `docs/superpowers/specs/2026-04-11-dev-cleanup-endpoint-design.md` **Implementation plan:** `docs/superpowers/plans/2026-04-11-dev-cleanup-endpoint.md` ## Why The `/dev refractor-test` command (discord#161) exercises the full refractor evaluation pipeline by creating synthetic games and plays. Its `CleanupView` button only deletes the game/plays/decisions — it leaves bloated season stats, orphaned variant cards, and permanently tier-upped refractor state (because `evaluate_card` enforces `current_tier = max(current_tier, new_tier)`). Manual SQL cleanup is currently required to re-run a test on the same card. This endpoint automates the full rollback. ## What's included - `app/routers_v2/dev.py` — router + `require_dev_env()` dependency that gates on `POSTGRES_DB=paperdynasty_dev` at request time (not import time, so prod containers still refuse) - `app/services/dev_rollback.py` — 14-step rollback service, transactional - `tests/test_dev_rollback.py` — 9 unit tests (service) - `tests/test_dev_router.py` — 6 router-level tests (HTTP contract) - `app/main.py` — register `dev.router` - `tests/conftest.py` — add `BattingCardRatings`/`PitchingCardRatings` to `_TEST_MODELS` (missing; variant card deletion requires them) No schema changes, no migrations. ## Safety - **403** when `POSTGRES_DB != paperdynasty_dev` (prod refuses) - **404** when `stratgame` row missing - **409** when `processed_at` is >10 minutes old (safety bail for stale games) - Full transaction; any failure rolls back atomically ## Test summary - 9 service unit tests — all green - 6 router HTTP tests — all green - Full suite: 13 pre-existing failures unchanged, 0 new failures introduced ## Follow-up Discord cog consumer (cleanup button calls this endpoint instead of manual SQL) is a separate PR in the discord-app repo. **DO NOT DEPLOY** — Cal will deploy separately after merge while manual testing is active on the dev API.
cal added 1 commit 2026-04-11 14:52:37 +00:00
Adds DELETE /api/v2/dev/test-games/{game_id} that fully reverses all side
effects of a game created by the Discord /dev refractor-test command:
stratplay, decision, processed_game, stratgame deletion; refractor_card_state
reset or deletion; variant card and audit row cleanup; season stats recompute.

Dev-gated via require_dev_env() dependency (returns 403 when POSTGRES_DB is
not 'paperdynasty_dev', checked at request time).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
cal added 1 commit 2026-04-11 15:01:32 +00:00
Step 8 of the rollback algorithm previously set last_evaluated_at
to NULL. While step 13's evaluate_card() overwrites it back to
now(), leaving the column transiently NULL is fragile: callers
that use /refractor/cards default evaluated_only=true filter
would miss the row if they observed it mid-rollback, and the
intermediate state would be invalid if step 13 is ever
refactored to skip the write.

Defensive: step 8 now writes a valid non-NULL timestamp itself.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude added the
ai-reviewing
label 2026-04-11 15:18:36 +00:00
Claude requested changes 2026-04-11 15:22:37 +00:00
Claude left a comment
Collaborator

AI Code Review

Files Reviewed

  • app/main.py (modified)
  • app/routers_v2/dev.py (added)
  • app/services/dev_rollback.py (added)
  • tests/conftest.py (modified)
  • tests/test_dev_rollback.py (added)
  • tests/test_dev_router.py (added)

Findings

Correctness

[BLOCKING] FK ordering bug — step 6b deletes variant cards before their referencing audit rows are gone (test-created states)

In app/services/dev_rollback.py, the step ordering causes a PostgreSQL foreign key constraint violation for the most common scenario: rolling back a test game that created a new refractor state (player had no prior refractor history).

Trace:

  • Step 5 populates audit_rows_to_delete using only reset_state_ids (pre-existing states). Test-created states (state_ids_to_delete) are explicitly excluded.
  • Step 7 deletes those audit rows — audit rows for test-created states are untouched.
  • Step 6b (lines 272–280) deletes BattingCard/PitchingCard rows for both reset and test-created states.
  • Step 9 (line 316–318) deletes the test-created RefractorCardState rows, which cascade-deletes their audit rows.

The problem: step 6b attempts to delete a BattingCard row that is still referenced by a RefractorBoostAudit row (from the test-created state). RefractorBoostAudit.battingcard is a ForeignKeyField with no on_delete clause, which means PostgreSQL uses RESTRICT (immediate enforcement). The deletion at step 6b will raise a FK violation before step 9 ever runs.

The SQLite test database does not enforce FK constraints by default, so all 9 unit tests pass despite the broken ordering on PostgreSQL.

Affected code path: any rollback where the test game created a new RefractorCardState (i.e. state_ids_to_delete is non-empty and one of those states has an audit row with a non-NULL battingcard_id or pitchingcard_id). This is the primary use-case for the endpoint.

Fix: In step 5, also collect audit rows for state_ids_to_delete and add them to audit_rows_to_delete. Step 7 then deletes all audit rows (for both reset and test-created states) before step 6b deletes the card rows. The cascade in step 9 becomes a no-op for those states (no rows left to cascade), which is safe.


Security

No issues. The require_dev_env() gate checks POSTGRES_DB at request time, correctly. The dev.router is unconditionally registered (visible in /api/docs on prod) but every handler is gated. No injection vectors, no secrets exposure.


Style & Conventions

[BLOCKING] Lazy import inside function body — violates CLAUDE.md

dev_rollback.py line 474:

from app.services.refractor_evaluator import evaluate_card  # noqa: PLC0415

This is inside the body of rollback_test_game(). CLAUDE.md: "Never add lazy imports to middle of file." The # noqa: PLC0415 comment suppresses the linter but doesn't resolve the violation.

No circular import exists. refractor_evaluator.py imports only datetime and logging at module level — adding from app.services.refractor_evaluator import evaluate_card to the top of dev_rollback.py is safe. Previous review (PR #187) blocked for the same pattern.


Suggestions

  • _AUDIT_WINDOW_SECONDS = 6 — tight window that could miss audit rows written slightly after processed_at on a slow DB. Consider bumping to 10–15s or making it configurable alongside _MAX_AUDIT_AGE_SECONDS.
  • Step 8 test comment inconsistency (test_pre_existing_state_reset_to_pre_test_tier): "Step 8 sets last_evaluated_at=NULL" — the code actually sets it to datetime.now(), not NULL. Harmless test comment, the assertion is correct.
  • audit_rows_deleted count excludes cascade-deleted audit rows for test-created states. If the FK bug is fixed and audit rows for test-created states are explicitly deleted in step 7, the count in the return dict will correctly reflect all deleted audit rows. Document this in the log line if it changes the count semantics.

Verdict: REQUEST_CHANGES

Two blocking issues:

  1. FK ordering bug: step 6b deletes variant BattingCard/PitchingCard rows before the RefractorBoostAudit rows that reference them are removed. On PostgreSQL this is a FK constraint violation for test-created states with variant cards. SQLite tests pass because FK enforcement is off by default. Fix: add test-created state audit rows to audit_rows_to_delete in step 5 so step 7 clears them before step 6b.

  2. Lazy import: from app.services.refractor_evaluator import evaluate_card inside the function body — move to top of file.


Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/main.py` (modified) - `app/routers_v2/dev.py` (added) - `app/services/dev_rollback.py` (added) - `tests/conftest.py` (modified) - `tests/test_dev_rollback.py` (added) - `tests/test_dev_router.py` (added) --- ### Findings #### Correctness **[BLOCKING] FK ordering bug — step 6b deletes variant cards before their referencing audit rows are gone (test-created states)** In `app/services/dev_rollback.py`, the step ordering causes a PostgreSQL foreign key constraint violation for the most common scenario: rolling back a test game that created a *new* refractor state (player had no prior refractor history). Trace: - **Step 5** populates `audit_rows_to_delete` using only `reset_state_ids` (pre-existing states). Test-created states (`state_ids_to_delete`) are explicitly excluded. - **Step 7** deletes those audit rows — audit rows for test-created states are untouched. - **Step 6b** (lines 272–280) deletes `BattingCard`/`PitchingCard` rows for **both** reset *and* test-created states. - **Step 9** (line 316–318) deletes the test-created `RefractorCardState` rows, which cascade-deletes their audit rows. The problem: step 6b attempts to delete a `BattingCard` row that is still referenced by a `RefractorBoostAudit` row (from the test-created state). `RefractorBoostAudit.battingcard` is a `ForeignKeyField` with no `on_delete` clause, which means PostgreSQL uses `RESTRICT` (immediate enforcement). The deletion at step 6b will raise a FK violation before step 9 ever runs. The SQLite test database does not enforce FK constraints by default, so all 9 unit tests pass despite the broken ordering on PostgreSQL. **Affected code path**: any rollback where the test game created a *new* `RefractorCardState` (i.e. `state_ids_to_delete` is non-empty and one of those states has an audit row with a non-NULL `battingcard_id` or `pitchingcard_id`). This is the primary use-case for the endpoint. **Fix**: In step 5, also collect audit rows for `state_ids_to_delete` and add them to `audit_rows_to_delete`. Step 7 then deletes all audit rows (for both reset and test-created states) before step 6b deletes the card rows. The cascade in step 9 becomes a no-op for those states (no rows left to cascade), which is safe. --- #### Security No issues. The `require_dev_env()` gate checks `POSTGRES_DB` at request time, correctly. The `dev.router` is unconditionally registered (visible in `/api/docs` on prod) but every handler is gated. No injection vectors, no secrets exposure. --- #### Style & Conventions **[BLOCKING] Lazy import inside function body — violates CLAUDE.md** `dev_rollback.py` line 474: ```python from app.services.refractor_evaluator import evaluate_card # noqa: PLC0415 ``` This is inside the body of `rollback_test_game()`. CLAUDE.md: *"Never add lazy imports to middle of file."* The `# noqa: PLC0415` comment suppresses the linter but doesn't resolve the violation. No circular import exists. `refractor_evaluator.py` imports only `datetime` and `logging` at module level — adding `from app.services.refractor_evaluator import evaluate_card` to the top of `dev_rollback.py` is safe. Previous review (PR #187) blocked for the same pattern. --- #### Suggestions - **`_AUDIT_WINDOW_SECONDS = 6`** — tight window that could miss audit rows written slightly after `processed_at` on a slow DB. Consider bumping to 10–15s or making it configurable alongside `_MAX_AUDIT_AGE_SECONDS`. - **Step 8 test comment inconsistency** (`test_pre_existing_state_reset_to_pre_test_tier`): "Step 8 sets last_evaluated_at=NULL" — the code actually sets it to `datetime.now()`, not NULL. Harmless test comment, the assertion is correct. - **`audit_rows_deleted` count excludes cascade-deleted audit rows** for test-created states. If the FK bug is fixed and audit rows for test-created states are explicitly deleted in step 7, the count in the return dict will correctly reflect all deleted audit rows. Document this in the log line if it changes the count semantics. --- ### Verdict: REQUEST_CHANGES Two blocking issues: 1. **FK ordering bug**: step 6b deletes variant `BattingCard`/`PitchingCard` rows before the `RefractorBoostAudit` rows that reference them are removed. On PostgreSQL this is a FK constraint violation for test-created states with variant cards. SQLite tests pass because FK enforcement is off by default. Fix: add test-created state audit rows to `audit_rows_to_delete` in step 5 so step 7 clears them before step 6b. 2. **Lazy import**: `from app.services.refractor_evaluator import evaluate_card` inside the function body — move to top of file. --- *Automated review by Claude PR Reviewer*
Claude added
ai-changes-requested
and removed
ai-reviewing
labels 2026-04-11 15:22:59 +00:00
Claude added the
ai-working
label 2026-04-11 15:30:50 +00:00
cal added 1 commit 2026-04-11 15:32:39 +00:00
- Move evaluate_card import to top of file (was lazy import in function body)
- Collect audit rows for test-created states in step 5 so step 7 deletes them
  before step 6b deletes BattingCard/PitchingCard rows; previously those card
  rows had RefractorBoostAudit FK RESTRICT references that caused PostgreSQL
  violations (SQLite tests passed because FK enforcement is off by default)
- Update stale comments in steps 6/7/9 to reflect new deletion ordering

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collaborator

Addressed both blocking issues from review (commit 42d6f8b):

1. FK ordering bug (step 5/7)
Added a new block in step 5 that also collects RefractorBoostAudit rows for test-created states (state_ids_to_delete) into audit_rows_to_delete. Step 7 now deletes all audit rows — for both reset and test-created states — before step 6b deletes the BattingCard/PitchingCard rows they reference. Previously, test-created state audit rows were expected to cascade-delete in step 9, but step 6b ran first and triggered PostgreSQL FK RESTRICT violations. The audit_rows_deleted count in the response now correctly reflects all explicitly deleted audit rows.

2. Lazy import
Moved from app.services.refractor_evaluator import evaluate_card from inside the rollback_test_game() function body (line 474) to the module-level imports at the top of the file. No circular import exists.

Addressed both blocking issues from review (commit 42d6f8b): **1. FK ordering bug (step 5/7)** Added a new block in step 5 that also collects `RefractorBoostAudit` rows for test-created states (`state_ids_to_delete`) into `audit_rows_to_delete`. Step 7 now deletes all audit rows — for both reset and test-created states — before step 6b deletes the `BattingCard`/`PitchingCard` rows they reference. Previously, test-created state audit rows were expected to cascade-delete in step 9, but step 6b ran first and triggered PostgreSQL FK RESTRICT violations. The `audit_rows_deleted` count in the response now correctly reflects all explicitly deleted audit rows. **2. Lazy import** Moved `from app.services.refractor_evaluator import evaluate_card` from inside the `rollback_test_game()` function body (line 474) to the module-level imports at the top of the file. No circular import exists.
Claude removed the
ai-working
ai-changes-requested
labels 2026-04-11 15:32:55 +00:00
Claude added the
ai-reviewing
label 2026-04-11 15:45:52 +00:00
Claude removed the
ai-reviewing
label 2026-04-11 15:50:53 +00:00
Claude added the
ai-reviewing
label 2026-04-11 16:15:50 +00:00
Claude reviewed 2026-04-11 16:18:17 +00:00
Claude left a comment
Collaborator

AI Code Review

Files Reviewed

  • app/main.py (modified)
  • app/routers_v2/dev.py (added)
  • app/services/dev_rollback.py (added)
  • tests/test_dev_rollback.py (added)
  • tests/test_dev_router.py (added)
  • tests/conftest.py (modified)

Prior Review Blockers — Status

Both issues flagged in the previous review pass in this revision:

FIXED — FK ordering bug (step 6b): Step 5 now explicitly collects audit rows for test-created states (the new if state_ids_to_delete: block), and step 7 deletes all of them before step 6b runs. BattingCard/PitchingCard rows are deleted only after their RefractorBoostAudit FK children are gone. PostgreSQL FK RESTRICT violation is no longer possible.

FIXED — Lazy import: from app.services.refractor_evaluator import evaluate_card is now at the file top (line ~32), not nested inside rollback_test_game(). No # noqa: PLC0415 needed.

Findings

Correctness

No issues found.

Logic trace for the FK-ordering fix:

  • Step 4 classifies each state as state_ids_to_delete (test-created) or states_to_reset (pre-existing) — mutually exclusive.
  • Step 5 collects audit rows for reset states (within window) and test-created states (within window, or all if no processed_at). Both categories land in audit_rows_to_delete.
  • Step 6 collects variant_audits for all_state_ids_for_variants = reset_state_ids + state_ids_to_delete using the same window conditions — consistent with step 5.
  • Step 7 deletes all collected audit rows before any card rows are touched.
  • Step 6b deletes card rows safely.

Security

  • require_dev_env() reads POSTGRES_DB at request time; defaults to "" (closed), not open. Prod containers refuse even if the module is imported.
  • game_id is FastAPI-typed int — no injection surface.
  • 500 detail leaks exception message, but the endpoint is dev-gated so this is acceptable and useful for debugging.

Style & Conventions

  • No lazy imports.
  • Step numbering in comments (2, 3, 4, 5, 6, 6b, 7–14) is slightly inconsistent (step 1 unlabeled, step 6b inserted mid-sequence) but readable.

Suggestions

  • variant_cards_deleted count: If two audit rows for different states reference the same battingcard_id (unlikely but theoretically possible), the count would be inflated relative to the actual DELETE rows affected. Non-blocking — the count is informational only and the DELETE is correct either way.
  • Step 13 evaluate_card exception swallowing: ValueError is caught and logged as warning. Exception is not caught — it would propagate and roll back the transaction. This is intentional (fail-safe) but worth a comment if not already clear. Not a blocker.

Verdict: COMMENT

Both prior blocking issues are resolved. Code is correct, conventions followed, security appropriate for a dev-only endpoint. Self-review restriction applies (author = repo owner) — no functional objections, ready to merge.


Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/main.py` (modified) - `app/routers_v2/dev.py` (added) - `app/services/dev_rollback.py` (added) - `tests/test_dev_rollback.py` (added) - `tests/test_dev_router.py` (added) - `tests/conftest.py` (modified) ### Prior Review Blockers — Status Both issues flagged in the previous review pass in this revision: **FIXED — FK ordering bug (step 6b):** Step 5 now explicitly collects audit rows for test-created states (the new `if state_ids_to_delete:` block), and step 7 deletes all of them before step 6b runs. `BattingCard`/`PitchingCard` rows are deleted only after their `RefractorBoostAudit` FK children are gone. PostgreSQL FK RESTRICT violation is no longer possible. **FIXED — Lazy import:** `from app.services.refractor_evaluator import evaluate_card` is now at the file top (line ~32), not nested inside `rollback_test_game()`. No `# noqa: PLC0415` needed. ### Findings #### Correctness No issues found. Logic trace for the FK-ordering fix: - Step 4 classifies each state as `state_ids_to_delete` (test-created) or `states_to_reset` (pre-existing) — mutually exclusive. - Step 5 collects audit rows for reset states (within window) **and** test-created states (within window, or all if no `processed_at`). Both categories land in `audit_rows_to_delete`. - Step 6 collects `variant_audits` for `all_state_ids_for_variants = reset_state_ids + state_ids_to_delete` using the same window conditions — consistent with step 5. - Step 7 deletes all collected audit rows before any card rows are touched. - Step 6b deletes card rows safely. ✅ #### Security - `require_dev_env()` reads `POSTGRES_DB` at request time; defaults to `""` (closed), not open. Prod containers refuse even if the module is imported. ✅ - `game_id` is FastAPI-typed `int` — no injection surface. ✅ - 500 detail leaks exception message, but the endpoint is dev-gated so this is acceptable and useful for debugging. ✅ #### Style & Conventions - No lazy imports. ✅ - Step numbering in comments (2, 3, 4, 5, 6, 6b, 7–14) is slightly inconsistent (step 1 unlabeled, step 6b inserted mid-sequence) but readable. #### Suggestions - **`variant_cards_deleted` count**: If two audit rows for different states reference the same `battingcard_id` (unlikely but theoretically possible), the count would be inflated relative to the actual `DELETE` rows affected. Non-blocking — the count is informational only and the DELETE is correct either way. - **Step 13 `evaluate_card` exception swallowing**: `ValueError` is caught and logged as warning. `Exception` is not caught — it would propagate and roll back the transaction. This is intentional (fail-safe) but worth a comment if not already clear. Not a blocker. ### Verdict: COMMENT Both prior blocking issues are resolved. Code is correct, conventions followed, security appropriate for a dev-only endpoint. Self-review restriction applies (author = repo owner) — no functional objections, ready to merge. --- *Automated review by Claude PR Reviewer*
Claude added
ai-reviewed
and removed
ai-reviewing
labels 2026-04-11 16:18:36 +00:00
claude-reviewer reviewed 2026-04-11 19:16:36 +00:00
claude-reviewer left a comment
Collaborator

Phase 1 smoke test from claude-reviewer PAT — safe to dismiss

Phase 1 smoke test from claude-reviewer PAT — safe to dismiss
This pull request has changes requested by an official reviewer.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feat/dev-cleanup-endpoint:feat/dev-cleanup-endpoint
git checkout feat/dev-cleanup-endpoint
Sign in to join this conversation.
No description provided.