diff --git a/paper-dynasty/2026-04-11.md b/paper-dynasty/2026-04-11.md new file mode 100644 index 0000000..420c806 --- /dev/null +++ b/paper-dynasty/2026-04-11.md @@ -0,0 +1,122 @@ +--- +title: "Refractor Test: dev deployment, bug hunt, rollback endpoint, cache fix" +description: "Deployed /dev refractor-test to local docker, fixed 5 distinct bugs found via manual testing, shipped PR #215 (rollback endpoint) and planned variant image_url cache fix." +type: context +domain: paper-dynasty +tags: [paper-dynasty, discord-app, database, refractor, dev-tools] +--- + +# Refractor Test: Dev Deployment, Bug Hunt, Rollback Endpoint, Cache Fix + +**Date:** 2026-04-11 +**Branches:** `test/refractor-comprehensive` (discord-app, uncommitted WIP), `feat/dev-cleanup-endpoint` (database, PR #215), `fix/variant-image-url-cache` (database, in-flight) +**Repos:** `paper-dynasty-discord`, `paper-dynasty-database` + +## What Was Done + +Started by resuming the `/dev refractor-test` Discord command work (PR discord#161, merged yesterday). Goal was to actually run the command end-to-end against dev and walk the full refractor evaluation pipeline. What looked like a 30-minute "flip it on and test" turned into a full day of bug-hunting across the discord cog, the database API, and the refractor evaluator. + +1. **Local dev docker stack up** — pulled `manticorum67/paper-dynasty-discordapp:dev`, started the compose stack in `discord-app/`. Required a GPG passphrase unlock for the Docker credential store before pull would succeed. + +2. **Seeded test card discovery** — queried dev postgres to find team 31's 13 pre-seeded `refractor_card_state` rows (11 T0, 2 T1). Produced a menu of card_ids suitable for tier-up testing: Markakis (1995), Henderson (639), Rutschman (425), Rodriguez (494 SP), Wells (4481 RP), Mullins (5713 control). + +3. **Cog bug #1: `KeyError: 'id'`** — `cogs/dev_tools.py:126` did `card["player"]["id"]` but the battingcards API nests player data with key `player_id`, not `id`. Fixed in-container via `docker cp`. Same bug at line 195 for `opposing_cards["cards"][0]["player"]["id"]`. + +4. **Cog bug #2: API timeout on opposing pitcher lookup** — the cog queried `/api/v2/pitchingcards?team_id=31&variant=0` to find an opposing pitcher for synthetic plays. That filter combination hangs for 30s. Cal asked the deeper question: "why do we need a pitcher for a batter refractor check at all?" Answer: we don't, the refractor logic only reads batter stats. The play row schema requires `pitcher_id NOT NULL` but that's just a DB constraint workaround, not a test requirement. + +5. **Cog fix: sentinel pitcher** — replaced the entire opposing-pitcher lookup with a hardcoded `SENTINEL_PITCHER_ID = 3` (Aaron Nola, first SP by player_id). One line, no API call, zero surface area. + +6. **Cog bug #3: card render JSON decode error** — `db_get` unconditionally calls `.json()` on responses, but the card render endpoint returns `image/png` bytes. Fixed by bypassing `db_get` for that one call with a raw `aiohttp` session that checks `r.status == 200` instead of parsing. + +7. **First successful test run (game 5387)** — Markakis went T1→T2, value 53→152, variant card created, render uploaded to S3. Cleanup button worked but only partially. + +8. **Discovered cleanup gap** — the `CleanupView` button only deletes game/plays/decisions. It does NOT roll back: + - `refractor_card_state.current_tier` (locked by `max()` no-regression clause in `evaluate_card`) + - Bloated `batting_season_stats` row + - Synthetic variant `battingcard` row and its ratings + - Orphan `refractor_card_state` rows for the sentinel pitcher + - `refractor_boost_audit` rows + +9. **Manual SQL cleanup (round 1)** — executed a 10-statement rollback against dev postgres to reset Markakis to tier=1, value=47, delete synthetic game 5388 + 33 plays + decision + processed_game row, delete the T2 variant battingcard + ratings, delete the boost audit row, and delete the orphan Nola state row. Required repointing `batting_season_stats.last_game_id` FK first. **Used wrong column (`bphr` instead of `homerun`) when computing the "correct" value — real baseline was 53, not 47.** Caught and corrected in round 2. + +10. **Shipped the rollback endpoint design** — wrote a full spec + implementation plan for a dev-only `DELETE /api/v2/dev/test-games/{game_id}` endpoint that reverses all side effects of a synthetic test game. Dispatched the engineer agent to implement. Agent shipped 9 service unit tests + 6 router tests, all green. Branch `feat/dev-cleanup-endpoint`, PR #215 opened by pd-ops. + +11. **"T0 → T1 no tier-up" mystery** — after the manual reset, Cal re-ran the test and reported Markakis showing "Current: T0 → Target: T1" with 5 plays inserted and no tier-up. Spent ~45 minutes debugging: verified cog code, API response shape, team_id resolution, player_id type matching, bot log routing (had to switch `getLogger(__name__)` → `getLogger("discord_app")` to get log output into the file handler). + +12. **Root cause found: `evaluated_only=true` default filter** — `/api/v2/refractor/cards` lists defaults `evaluated_only` to `true`, which filters out any row with `last_evaluated_at IS NULL`. My manual SQL reset set that column to NULL. Result: Markakis dropped out of the cog's response, the player_id match failed, and the cog fell through to the tier=0 default path (gap=37, 5 plays). Debug logging in the cog proved it definitively — `item_count=12`, `item_pids=[...]` with Markakis missing. + +13. **PR #215 fix: set `last_evaluated_at = datetime.now()` on rollback reset** — Cal's instruction: "evaluated_only=false is a workaround for not a solution. when we reset, set the last_eval_time to now; have an agent pick up that pr branch and update it." Engineer stopped to flag that step 13's `evaluate_card` already writes the timestamp unconditionally, so step 8's NULL is transient. Cal's defensive design won: step 8 should produce a valid intermediate state on its own, not rely on step 13. Engineer pushed commit `b44c4a6` to the existing branch; PR #215 auto-updated. + +14. **Proof-of-work test run (game 5392)** — confirmed the full pipeline works end-to-end now that the cog can read state correctly. Game created, 11 HR plays inserted (matches plan for gap 96), tier-up T1→T2, audit row 9 written, variant battingcard 6342 created, season stats recomputed (pa=21, hits=13, hr=11), T2 variant PNG uploaded to S3 at `cards/cardset-012/player-6557/v415156387/battingcard.png`. + +15. **Discovered bug #5: variant image_url cache-hit bypass** — the T2 variant battingcard row 6342 had `image_url = NULL` despite the PNG being live on S3. Traced to `players.py:get_batter_card` around line 906: the file-on-disk cache short-circuit returns `FileResponse` immediately when the cached PNG exists, bypassing the `backfill_variant_image_url` BackgroundTask that populates the DB column. Any variant row recreated with `image_url=NULL` while its PNG is cached stays NULL forever. **Not just a dev issue — prod hits this any time a variant row is regenerated against a cached PNG.** + +16. **Variant image_url plan written + engineer dispatched** — `docs/superpowers/plans/2026-04-11-variant-image-url-cache-bug.md`. Extract backfill-scheduling into a `_schedule_variant_image_backfill` helper, call it in both the cache-hit and full-render paths, add 5 cache-scenario tests, add `logger.info` for observability on cache-hit. Branch `fix/variant-image-url-cache`. Engineer running in background at time of writing. + +## Decisions + +### Sentinel pitcher over per-team lookup +The test command originally queried `pitchingcards?team_id=...&variant=0` to find a "real" opposing pitcher. That query times out (30s), but more importantly it solves a problem that doesn't exist — the refractor evaluator doesn't read pitcher data when evaluating a batter's refractor state. The pitcher is only needed because `stratplay.pitcher_id` is `NOT NULL` in the schema. Hardcoding player 3 (Aaron Nola, first SP) skips the lookup entirely and makes the test deterministic. For batter tests, Nola is a throwaway foreign key; for pitcher tests (future), a separate sentinel batter would be needed. + +### Rollback endpoint, not snapshot-based test sessions +Considered a "begin/commit/rollback session" pair of endpoints that would snapshot affected state before the test runs. Rejected in favor of a single `DELETE /api/v2/dev/test-games/{game_id}` endpoint that reconstructs the pre-test state from remaining data (audit rows, real season stats, untouched real plays). Cleaner: the cog doesn't need to change how it creates test data, and the rollback is self-contained. Downside: requires explicit tier-reset logic (because `evaluate_card` enforces `current_tier = max(current, new)` — no regression). + +### Timestamp window over migration for audit-row identification +The rollback endpoint needs to identify which `refractor_boost_audit` rows belong to the synthetic game being rolled back. `refractor_boost_audit` has no `game_id` column. Two options: add one via migration (clean, more typing), or filter by `applied_at` within a window around `processed_game.processed_at` (no migration, fragile). Chose the timestamp window for the dev-only rollback — `apply_tier_boost` and `update_season_stats` run in the same transaction, so the gap is milliseconds; a 6-second window with a 10-minute safety bail is safe. If this becomes a pattern for other rollback work, migrate then. + +### Defensive `last_evaluated_at` write in step 8 +After rollback step 8 sets tier/variant/value on the card_state, step 13 calls `evaluate_card` which unconditionally writes `last_evaluated_at = now()`. Step 8's own NULL would be transient. Cal's call: still set `last_evaluated_at = datetime.now()` in step 8 anyway. Rationale: +- Separation of concerns: step 8 "resets state to a valid pre-test-tier form". Valid includes non-NULL timestamp. +- Fault tolerance: if step 13 is ever refactored to skip the write, step 8's NULL becomes user-visible corruption (caller using default `evaluated_only=true` misses the row). +- Matches what a human operator does during manual cleanup. + +### Debug logging routing fix (`discord_app` vs `__name__`) +The cog originally used `logging.getLogger(__name__)` which produced a `cogs.dev_tools` logger. The bot's file handler is attached to the `discord_app` logger, so cog-local log output went nowhere. Changed to `logging.getLogger("discord_app")` to route through the existing handler. Not a bug per se, but worth noting — if any other cog ever needs to debug with `logger.info`, they'll hit the same silent-log problem. + +## Follow-Up + +| # | Title | Priority | +|---|-------|----------| +| [PR #215](https://git.manticorum.com/cal/paper-dynasty-database/pulls/215) | Dev cleanup rollback endpoint — merge + deploy | High | +| Variant image_url cache fix | Engineer running at time of writing; needs PR via pd-ops | High | +| Discord cog consumer for rollback endpoint | Follow-up PR in discord-app to replace `CleanupView` button's partial delete with a call to `DELETE /api/v2/dev/test-games/{game_id}` | Medium | +| Revert cog debug logging | The `dev_tools state-lookup: ...` `logger.info` calls I added for debugging should come out before the cog changes get committed | Low | +| Stale PNG file cleanup | The cache-hit bug exists because variant rows get regenerated against disk-cached PNGs. A separate concern: when a variant row is deleted (via rollback endpoint), the PNG file should also be deleted from disk. Not in scope for PR #215 or the cache-fix PR. | Low | +| Dangling game 5392 residue in dev DB | Left intentionally so it can be used to exercise the new rollback endpoint once PR #215 deploys | Low | + +## Files Changed + +### discord-app (uncommitted — WIP for eventual commit/cleanup) +- `cogs/dev_tools.py` — multiple fixes: player_id KeyError, sentinel pitcher, raw aiohttp for card render, debug logging (to be reverted before commit) + +### database (PR #215, branch `feat/dev-cleanup-endpoint`) +- `app/routers_v2/dev.py` — new router, `require_dev_env()` dependency, `DELETE /api/v2/dev/test-games/{game_id}` handler +- `app/services/dev_rollback.py` — 14-step rollback algorithm (with `last_evaluated_at = datetime.now()` fix in b44c4a6) +- `app/main.py` — registered `dev.router` +- `tests/test_dev_rollback.py` — 9 service unit tests +- `tests/test_dev_router.py` — 6 router HTTP contract tests +- `tests/conftest.py` — added `BattingCardRatings`/`PitchingCardRatings` to `_TEST_MODELS` + +### database (branch `fix/variant-image-url-cache`, in-flight) +- `app/routers_v2/players.py` — extract `_schedule_variant_image_backfill` helper, call from both cache-hit and full-render paths +- `tests/test_players_card_render.py` (or similar) — 5 new cache-hit scenario tests + +### docs/superpowers +- `specs/2026-04-11-dev-cleanup-endpoint-design.md` — spec for rollback endpoint (updated in b44c4a6) +- `plans/2026-04-11-dev-cleanup-endpoint.md` — implementation plan for rollback endpoint +- `plans/2026-04-11-variant-image-url-cache-bug.md` — implementation plan for cache bug fix + +## Session Data Footprint + +The following synthetic rows exist in dev postgres from the test runs (retained intentionally for post-merge testing of PR #215): +- `stratgame id=5392` (game_type='test') +- 11 `stratplay` rows with `batter_id=6557, pitcher_id=3, game_id=5392` +- 1 `decision` row for game 5392 +- 1 `processed_game` row for game 5392 +- `refractor_card_state id=8752` (player 3, team 31, tier=0, orphan) +- `refractor_boost_audit id=9` (card_state 8734, tier=2) +- `battingcard id=6342` (variant 415156387, image_url=NULL ← the cache bug) +- `batting_season_stats` for (6557, 31, s10) bloated: pa=21 hits=13 hr=11 (vs real 10 plays) +- Markakis current state: tier=2, value=152, variant=415156387 (last_evaluated_at 09:58:54) + +These can all be rolled back cleanly once PR #215 deploys, by calling `DELETE /api/v2/dev/test-games/5392`.