diff --git a/paper-dynasty/autonomous-nightly-2026-04-10-run2.md b/paper-dynasty/autonomous-nightly-2026-04-10-run2.md new file mode 100644 index 0000000..dcb7cde --- /dev/null +++ b/paper-dynasty/autonomous-nightly-2026-04-10-run2.md @@ -0,0 +1,95 @@ +--- +title: "Autonomous Nightly Run — 2026-04-10 (run 2)" +description: "Second autonomous pipeline run of the day: 4 PRs created (1 APPROVED, 3 REQUEST_CHANGES), 11 items queued to pd-plan, 0 rejections" +type: context +domain: paper-dynasty +tags: [autonomous-pipeline, nightly-run] +--- + +## Run Metadata +- Date: 2026-04-10 (second run of the day; see autonomous-nightly-2026-04-10.md for run 1) +- Duration: ~25 minutes wall clock +- Slots before: 0/10 S, 0/5 M (no prior autonomous PRs open from run 1) +- Slots after: 4/10 S, 0/5 M (4 S items in_progress pending merge) + +## Findings +- Analyst produced 5 findings +- Growth-po produced 10 findings +- Dedup filtered: 0 duplicates, 0 partial overlaps (haiku call skipped — both comparison lists were empty, making all 15 findings trivially novel) + +## PO Decisions +| Finding ID | PO | Decision | Size | Notes | +|---|---|---|---|---| +| analyst-2026-04-10-001 | database-po | approved | M | HTTPException-200 sweep — consumer audit required | +| analyst-2026-04-10-002 | database-po | reshaped | S | Drop premature empty-table 404s; do NOT materialize large querysets | +| analyst-2026-04-10-003 | discord-po | approved | S | Bare except narrowing (high severity) | +| analyst-2026-04-10-004 | database-po | approved | M | Packs beachhead tests — sequence after 001/002 | +| analyst-2026-04-10-005 | (autonomous) | auto-approved | S | Structured rejection parser | +| growth-sweep-2026-04-10-001 | discord-po | reshaped | M | Command logging — split into db endpoint + bot middleware | +| growth-sweep-2026-04-10-002 | database-po | approved | S | Card of the week endpoint | +| growth-sweep-2026-04-10-003 | discord-po | approved | S | Gauntlet results recap | +| growth-sweep-2026-04-10-004 | discord-po | approved | S | /compare command | +| growth-sweep-2026-04-10-005 | discord-po | approved | M | /profile command — needs aggregate endpoint | +| growth-sweep-2026-04-10-006 | discord-po | approved | S | Rarity celebration embeds — use canonical rarity names | +| growth-sweep-2026-04-10-007 | discord-po | approved | S | Gauntlet schedule + reminder | +| growth-sweep-2026-04-10-008 | discord-po | approved | M | Starter pack grant — idempotent, onboarding critical | +| growth-sweep-2026-04-10-009 | discord-po | approved | M | /pack history with pack_log table | +| growth-sweep-2026-04-10-010 | database-po | reshaped | M | Webhook infra first, cardset hook as consumer | + +## PRs Created + +| PR | Repo | Title | Tests | Review | +|---|---|---|---|---| +| #163 | discord-app | fix(gameplay): replace bare except with NoResultFound | pre-existing collection failures (testcontainers missing locally) | **REQUEST_CHANGES** — cache_player uses session.get which returns None, not raises; new except NoResultFound is unreachable and caller crashes with AttributeError | +| #164 | discord-app | feat(gauntlet): auto-post results recap embed | PASS (14 new tests) | **REQUEST_CHANGES** — `loss_max or 99` treats loss_max=0 as falsy, causing perfect-run bonus tier to show ⬜ instead of ❌ on 10-1 finish | +| #212 | database | feat(api): card of the week featured endpoint | PASS (6 new tests) | **APPROVED** — joins, AI exclusion, tiebreak, 404 handling all correct. Merge via `pd-pr merge --no-approve` | +| #165 | discord-app | feat(cogs): /compare slash command | PASS (30 new tests) | **REQUEST_CHANGES** — `_is_pitcher` omits CP (Closing Pitcher), silently misclassifies closers as batters | + +## Mix Ratio +- Recent history: insufficient data (first full pipeline run after the 2-PR morning run); skipped the bash ratio check to conserve budget +- Bias applied this run: none (interleaved stability/feature manually) +- Dispatched mix: 1 stability (analyst-003) + 3 feature (growth-002/003/004). 1:3 is feature-heavy; balance the next run toward stability if this trend continues + +## Wishlist Additions +None. All Large items were scoped as M or smaller by POs — nothing escalated to the L wishlist this run. + +## Queued to pd-plan (waiting for slot) +Added as `status=active`, `slot=autonomous`: +- #20: Sweep HTTPException(status_code=200) in routers (M, database) +- #21: Remove double-count and premature empty-table 404s (S, database) +- #22: Beachhead integration tests for packs router (M, database) +- #23: Structured rejection parser for autonomous pipeline (S, autonomous) +- #24: Command usage logging — bot middleware + db endpoint (M, multi-repo) +- #25: Player profile command /profile (M, multi-repo) +- #26: Rarity celebration embeds for pack pulls (S, discord-app) +- #27: Gauntlet schedule + reminder task (S, discord-app) +- #28: Starter pack grant for new players (M, multi-repo) +- #29: Pack opening history command /pack history (M, multi-repo) +- #30: Outbound webhook dispatcher + cardset publish hook (M, database) + +Shipped as in_progress linked to PRs: +- #31 → discord-app#163 +- #32 → discord-app#164 +- #33 → database#212 +- #34 → discord-app#165 + +## Rejections +None. All 15 findings passed PO review (5 reshaped, 10 approved as-is). + +## Self-Improvement Notes + +1. **pr-reviewer caught 3 of 4 real bugs.** This is exactly the value the review gate is supposed to provide. Worth noting that tests passed on all three REQUEST_CHANGES PRs — the bugs were specifically in code paths the author's own tests didn't exercise: + - PR #163: author didn't test a session.get cache-miss; the narrowed exception class doesn't actually match the real "not found" signal in that function + - PR #164: test asserted absence of ✅ but not presence of ❌, missing the falsy-zero substitution bug + - PR #165: test suite didn't include a CP (closer) case, so the position-gate gap was invisible + Engineer prompts should explicitly require adversarial tests that exercise the exact code path the change modifies, including zero/empty/None boundary values. + +2. **Worktree contamination on PR #165.** The /compare PR diff included `gauntlets.py` and `tests/test_gauntlet_recap.py` changes from PR #164, plus a `gameplay_queries.py` formatting touch from PR #163. Parallel worktrees branching from the same mainline apparently picked up each other's state. Investigate whether `isolation: "worktree"` in the Agent tool produces a fully isolated checkout or whether engineers need to explicitly branch from `origin/main`. If worktrees share a .git, sequential dispatch may be safer for tighter commit isolation. + +3. **Budget headroom tight at scale.** Dispatched only 4 of 15 approved items due to budget caution. 4 engineers + 4 reviewers consumed ~$10 (~$1.20/agent). At this rate, filling all 15 slots would require a ~$30 budget ceiling. Options: (a) use Haiku for engineers on mechanical changes like the HTTPException sweep, (b) batch multiple small fixes into one engineer invocation when they touch the same file, (c) cache common context via a prewarm step. + +4. **Rejection parser finding is legit.** analyst-005's observation about rejection markdown blobs in dedup input is correct — when the rejection list grows, raw markdown will poison semantic matching quality. Auto-approved to the queue (#23). Self-improving the pipeline itself is exactly the kind of work the `autonomous` repo scope was added for. + +5. **Empty dedup lists mean haiku call was dead weight.** Implement a preflight short-circuit: if `open_autonomous_prs` AND `recent_rejections` are both empty, skip the dedup haiku call entirely. Saves ~$0.05 and a few seconds per clean-slate run. + +6. **Database-po reshape was substantive.** Both reshape decisions from database-po (analyst-002, growth-010) were correct and saved bad PRs. The original analyst recommendation for analyst-002 (materialize large querysets) would have regressed performance; the PO catch saved a regression. Growth-010's reshape correctly identified that the real cost of 2.6a is the webhook dispatcher plumbing, not the hook site. Keep POs in the loop for all findings — the cost is justified.