docs: sync KB — autonomous-nightly-2026-04-10.md,autonomous-pipeline-session-2026-04-10.md

2026-04-10 04:00:47 -05:00 · 2026-04-10 04:00:47 -05:00 · 87aeaf3309
commit 87aeaf3309
parent 8d165efbe6
2 changed files with 240 additions and 0 deletions
--- a/paper-dynasty/autonomous-nightly-2026-04-10.md
+++ b/paper-dynasty/autonomous-nightly-2026-04-10.md
@ -0,0 +1,95 @@
+---
+title: "Autonomous Nightly Run — 2026-04-10"
+description: "First autonomous nightly run: 2 PRs shipped, 7 items queued, 0 rejections. Budget-constrained dispatch."
+type: context
+domain: paper-dynasty
+tags: [autonomous-pipeline, nightly-run]
+---
+
+## Run Metadata
+- Date: 2026-04-10
+- Slots before: 10/10 S, 5/5 M (no active autonomous work)
+- Slots after: 8/10 S, 5/5 M (2 S slots now in-flight via PRs)
+- Open autonomous PRs before run: 0
+- Recent rejections: 0
+- Budget constraint: run hit the $5 USD ceiling early due to broad analyst sweep; dispatched 2 engineers instead of full slot fill.
+
+## Findings
+- Analyst produced 8 findings across database, discord-app, and autonomous pipeline
+- Growth-po produced 5 findings (all discord-app, all S-sized, all Phase 2 roadmap items)
+- Dedup haiku: **skipped** (0 open PRs + 0 rejections = no possible duplicates; all findings novel by construction)
+
+## PO Decisions
+
+### Database-po (4 findings)
+| Finding ID | Decision | Size | Notes |
+|---|---|---|---|
+| analyst-2026-04-10-002 | approved | S | HTTPException(200) sweep across ~10 routers |
+| analyst-2026-04-10-004 | approved | S | N+1 Paperdex fix; add query-count regression test |
+| analyst-2026-04-10-006 | reshaped | M | Split into 3 S tickets, start with pack-opening tests |
+| analyst-2026-04-10-008 | approved | S | Remove unfiltered pre-count in GET /packs **→ shipped** |
+
+### Discord-po (8 findings)
+| Finding ID | Decision | Size | Notes |
+|---|---|---|---|
+| analyst-2026-04-10-001 | approved | S | Delete dead gameplay_legacy.py **→ shipped** |
+| analyst-2026-04-10-003 | approved | S | Economy tree.on_error override (play-lock bug) — **high priority** |
+| analyst-2026-04-10-005 | reshaped | M | Two-phase cutover for economy_new/packs.py migration |
+| growth-sweep-2026-04-10-001 | approved | S | Rarity celebration embeds — use canonical rarity vocab |
+| growth-sweep-2026-04-10-002 | approved | S | /compare command — ephemeral by default, LHP/RHP split |
+| growth-sweep-2026-04-10-003 | approved | S | Gauntlet results recap embed |
+| growth-sweep-2026-04-10-004 | reshaped | M | Command usage telemetry — cross-repo, needs privacy review |
+| growth-sweep-2026-04-10-005 | reshaped | S+M | Split: /gauntlet schedule (S) first, reminder scheduler (M) after scheduler approach specced |
+
+### Self-improvement (auto-approved, no PO gate)
+| Finding ID | Decision | Size | Notes |
+|---|---|---|---|
+| analyst-2026-04-10-007 | approved | S | Split run-nightly.sh stdout/stderr, write last-run-result.json, voice-notify on failure |
+
+## PRs Created
+- **discord-app#162** — `chore(cogs): remove dead gameplay_legacy cog (4,723 lines, zero references)` — tests PASS (no new failures; 2 pre-existing SQLite path issues unchanged), labels applied, **pr-reviewer dispatch skipped (budget)** — https://git.manticorum.com/cal/paper-dynasty-discord/pulls/162
+- **database#211** — `fix(packs): remove unfiltered pre-count in GET /packs (3 round-trips → 2)` — tests PASS (266 passed, 13 pre-existing failures unchanged), consumer check clean (no 404 handlers in discord-app), labels applied, **pr-reviewer dispatch skipped (budget)** — https://git.manticorum.com/cal/paper-dynasty-database/pulls/211
+  - **Post-run diagnostic:** Pyright flagged 4 `Pack.id` attribute access errors after ruff reformatted the file. These are Peewee ORM false positives (`id` is added dynamically by Peewee's Model metaclass) and are pre-existing elsewhere in the codebase. Not a regression from this change.
+
+## Mix Ratio
+- No prior digests — this is the first autonomous nightly run. Default 1:1 interleave applied.
+- This run shipped 2 stability items and 0 features. Next run should bias toward feature dispatches if budget permits.
+
+## Wishlist Additions
+- None. All approved items are S or M and could fit within a normal slot budget — no L-sized items surfaced in this sweep.
+
+## Queued for Next Run (approved but not dispatched due to budget)
+The following items are **approved and ready to ship** but were not dispatched this run. They should be picked up first thing next run:
+
+**High priority (stability, real user impact):**
+1. `analyst-2026-04-10-003` (S) — Economy cog overwrites global tree.on_error, bypassing play-lock release. **Players are getting stuck due to this bug.** Should be the first item dispatched next run.
+2. `analyst-2026-04-10-002` (S) — HTTPException(200) sweep across ~10 DB routers.
+3. `analyst-2026-04-10-004` (S) — N+1 Paperdex fix in players endpoints.
+
+**Self-improvement:**
+4. `analyst-2026-04-10-007` (S) — run-nightly.sh stdout/stderr split + last-run-result.json. This is a *prerequisite* for reliable future runs; should be prioritized.
+
+**Features (growth):**
+5. `growth-sweep-2026-04-10-001` (S) — Rarity celebration embeds.
+6. `growth-sweep-2026-04-10-003` (S) — Gauntlet results recap embed.
+7. `growth-sweep-2026-04-10-002` (S) — /compare command.
+
+**Reshaped (needs spec work before dispatch):**
+- `analyst-2026-04-10-006` (M) — first of 3 split tickets: pack-opening happy path + insufficient funds + duplicate handling.
+- `analyst-2026-04-10-005` (M) — Phase 1 spec of economy.py vs economy_new/packs.py drift.
+- `growth-sweep-2026-04-10-004` (M) — Cross-repo telemetry; needs privacy posture confirmation.
+- `growth-sweep-2026-04-10-005` Issue A (S) — /gauntlet schedule command (pure read).
+
+## Rejections
+- None this run.
+
+## Self-Improvement Notes
+
+**The pipeline hit its $5 budget ceiling after dispatching analyst + growth-po + 2 POs + 2 engineers.** Breakdown of spend was top-heavy: the analyst agent alone consumed roughly half the budget due to a 411s, 104-tool-use deep audit. Observations for future runs:
+
+1. **Analyst cap**: Consider passing a stricter cap (e.g., "limit to top 5 findings, max 30 tool uses") to the analyst to keep its spend predictable.
+2. **Dedup skip was correct**: With 0 open PRs and 0 rejections, the dedup haiku call would have been pure overhead. Encoding this as an orchestrator shortcut (skip dedup when both inputs are empty) would save ~$0.10 per first-run scenario.
+3. **pr-reviewer was skipped**: Engineer PRs #162 and #211 did not receive an automated review pass. Cal should manually review these before merge. Future runs should reserve ~$0.30 per PR for pr-reviewer.
+4. **pd-plan CLI skipped**: Approved-but-queued items are documented in this digest only, not in the pd-plan database. Next run's preflight should parse this digest's "Queued for Next Run" section and dispatch those items first before generating new findings.
+5. **Budget-aware slot filling**: Orchestrator should compute a rough budget forecast (analyst ~$2, each PO ~$0.30, each engineer ~$0.60, each pr-reviewer ~$0.30) before dispatching engineers, and cap engineer count at `(remaining_budget - digest_reserve) / (engineer_cost + reviewer_cost)`.
+6. **The `analyst-2026-04-10-007` self-improvement item directly addresses observability gaps that made this digest harder to write** — prioritize it next run.
--- a/paper-dynasty/autonomous-pipeline-session-2026-04-10.md
+++ b/paper-dynasty/autonomous-pipeline-session-2026-04-10.md
@ -0,0 +1,145 @@
+---
+title: "Autonomous Improvement Pipeline — Build Session 2026-04-09/10"
+description: "Single-session design + implementation + first smoke test of the Paper Dynasty autonomous improvement pipeline. 2 PRs shipped, system ready to run nightly pending one more test."
+type: context
+domain: paper-dynasty
+tags: [autonomous-pipeline, session-summary, paper-dynasty, architecture]
+---
+
+## Summary
+
+In a single session spanning 2026-04-09 evening through 2026-04-10 early morning, Cal and Claude designed, specced, planned, implemented, merged, and ran the first smoke test of a nightly autonomous improvement pipeline for the Paper Dynasty ecosystem. The goal: a system where Cal wakes up to a Monday-morning queue of "here's what Claude did for you" PRs he can review and merge, keeping momentum even when he's unavailable.
+
+The system ships. It produced 2 real, mergeable PRs on its first run before hitting a budget ceiling. Post-run fixes are in. The systemd timer is installed but not enabled pending one more validation run.
+
+## The arc of the session
+
+### Phase 1 — Brainstorming (spec)
+
+Cal arrived with a two-part idea: (1) introspection on the codebase to recommend updates, (2) recommendations for workflow/tooling optimization. Through ~15 clarifying exchanges, we landed on this shape:
+
+- **Nightly scheduled** (not on-demand) — moves forward despite Cal's schedule
+- **Autonomous PR dispatch** (not just reports) — Monday morning review queue
+- **WIP slot limits** to prevent overwhelm: 10 S, 5 M, no autonomous L; L items go to a wishlist
+- **1:1 stability/feature bias** — mix both types of work
+- **Three repos in scope:** database, discord-app, card-creation (card-creation has its own autonomous dynamic now)
+- **Separation of concerns:**
+  - New **analyst agent** does code audits with fresh eyes (no ownership bias)
+  - **growth-po** does product/roadmap sweeps in a new "sweep mode"
+  - **Domain POs** (database-po, discord-po, cards-po) gate findings with go/no-go decisions
+  - **Engineer agents** build approved S/M work in isolated worktrees
+  - **pr-reviewer** gates PRs before Cal sees them
+- **Rolling 30-day rejection log** so the pipeline doesn't re-suggest rejected ideas
+- **Hybrid tracking:** pd-plan for slot counts + wishlist, KB for digests + rejection log
+- **Transparency as a core value** — every decision, rejection, and action documented so both humans and future agents have full context
+
+### Phase 2 — Plan
+
+20-task implementation plan written and self-reviewed against the spec. Caught one gap during self-review: the mix ratio (§9) wasn't explicitly implemented anywhere. Added a step 6b to the orchestrator prompt. Another round of refinements during plan review:
+
+1. Wishlist → Run Digest connection (L items should appear in nightly digest)
+2. Rolling 30-day rejection context fed to analyst + growth-po to avoid re-discovery
+3. Pure-bash preflight for pure data lookups (slot check, git pull, PR inventory, rejection query) — no LLM spin-up on "no slots" nights
+4. Dedup as a haiku call (not a script) — semantic matching catches rewording
+
+### Phase 3 — Implementation (subagent-driven)
+
+Created worktree `.worktrees/autonomous-pipeline` on branch `feat/autonomous-pipeline`. Executed plan via subagent-driven-development skill:
+
+- **Task 1** (inline): scaffolded `autonomous/` directory with README
+- **Batch A** (sonnet subagent, Tasks 2-5): extended `pd-plan` CLI with `slot`/`wishlist` schema columns, `slots`/`wishlist` subcommands, `--slot`/`--wishlist` flags on `add`/`update`, new summary section. 8 pytest tests, all passing.
+- **Task 6** (sonnet subagent): `autonomous/lib/check_slots.py` with 3 pytest tests
+- **Batch B** (sonnet subagent, Tasks 7-9): bash scripts `inventory_prs.sh`, `query_rejections.sh`, `preflight.sh`. Notable: switched from `tea pulls list` to `tea api` because the former returns labels as a flat string (not objects).
+- **Batch C** (sonnet subagent, Tasks 10-14): `.claude/agents/analyst.md`, sweep-mode append to `growth-po.md`, `dedup-haiku.md`, `orchestrator.md` (284 lines), `run-nightly.sh` wrapper
+- **Task 18** (inline): preflight skip smoke test — added 15 dummy initiatives, verified `preflight.sh` exits 1, cleaned up
+
+11 commits on the feature branch. Fast-forward merged to main. Worktree force-removed. Branch deleted. Pushed to origin.
+
+One snag worth noting: the first subagent dispatch hit a wall of permission prompts Cal had to click through. Existing memory already had the rule "code-writing subagents MUST use mode: acceptEdits" — I'd just failed to apply it. Fixed for all subsequent dispatches.
+
+### Phase 4 — Integration (Gitea + systemd)
+
+- **Gitea labels** created via pd-ops agent in all 3 sub-project repos: `autonomous`, `size:S`, `size:M`, `type:stability`, `type:feature` (colors: `#6366f1`, `#10b981`, `#f59e0b`, `#0891b2`, `#ec4899`). Umbrella repo got its own set later when the observability ticket was filed.
+- **Scheduled task** at `~/.config/claude-scheduled/tasks/autonomous-nightly/` — settings.json (haiku outer, $1 budget, 3600s timeout), prompt.md (just runs the wrapper), mcp.json (empty; the inner claude inherits Cal's global MCP config including gitea-mcp)
+- **Systemd timer** at `~/.config/systemd/user/claude-scheduled@autonomous-nightly.timer` — nightly 02:00 with 15-min random delay, Persistent=true. Registered but NOT enabled.
+
+### Phase 5 — First smoke test
+
+Kicked off `autonomous/run-nightly.sh` at 02:40:07 local. Ran 15 minutes. Terminated at 02:55:47 by the $5 budget ceiling.
+
+**Despite the budget hit, the pipeline actually worked:**
+
+- Preflight ran cleanly (slots 10S/5M free, 0 open PRs, 0 rejections)
+- Analyst produced 8 findings across database, discord-app, autonomous (self-improvement)
+- Growth-po produced 5 findings (all discord Phase 2 roadmap items, all S-sized)
+- Dedup correctly skipped (empty inputs = no possible dupes)
+- POs made real decisions: many approved, several thoughtfully reshaped
+- 2 PRs shipped before budget ran out, both correctly labeled and mergeable
+
+**PRs shipped:**
+
+- **discord-app#162** — `chore(cogs): remove dead gameplay_legacy cog (4,723 lines, zero references)` — caught that `cogs/gameplay_legacy.py` was 4,723 lines of dead code with zero inbound references
+- **database#211** — `fix(packs): remove unfiltered pre-count in GET /packs (3 round-trips to 2)` — caught a real correctness bug: unfiltered `Pack.select().count()` was returning 404 when no packs existed globally instead of returning empty filter results
+
+**What went wrong:**
+
+1. Analyst alone consumed ~$2.50 with a 411s, 104-tool-use deep sweep
+2. `pr-reviewer` dispatch was skipped — budget ran out
+3. Digest Write was permission-denied (inner claude wasn't running with --dangerously-skip-permissions) — manually extracted and saved from the JSON output
+4. pd-plan integration skipped — approved queued items only in the digest
+5. 7 approved items never dispatched, including a high-priority real bug (economy cog overwriting `tree.on_error` causing stuck play-lock)
+6. Multiple Bash tool denials wasted budget on retries (compound commands, venv activation, `source`, curl, `diff <()`)
+
+### Phase 6 — Post-run fixes
+
+Spun up a yolo-mode `claude -p` agent to apply three critical fixes. Commit `a79efb2`:
+
+1. Inner claude budget: $5 → $20
+2. Added `--dangerously-skip-permissions` to inner claude in `run-nightly.sh`
+3. Analyst scope tightened in `.claude/agents/analyst.md`: max findings 15 → 5, added 30 tool-use cap with budget starvation rationale
+
+Also filed `cal/paper-dynasty-umbrella#3` (labels: `autonomous`, `size:S`, `type:stability`) for the observability self-improvement (split stdout/stderr, write `last-run-result.json`, voice-notify on failure). This is exactly the kind of ticket the pipeline could pick up on a future autonomous run.
+
+## Current state (as of 2026-04-10)
+
+- ✅ All code merged to main and pushed to origin
+- ✅ 15 Gitea labels created across 4 repos (3 sub-projects + umbrella)
+- ✅ Scheduled task installed
+- ✅ Systemd timer unit installed
+- ✅ 2 real PRs shipped (pending Cal review / reviewer pipeline)
+- ✅ Observability ticket filed
+- ✅ Post-run fixes applied
+- ⏸️ Systemd timer **NOT ENABLED** — pending one more validation smoke test with the $20 budget + tightened analyst
+
+## Queued work for next run
+
+See `project_autonomous_first_run.md` memory file for the full list. Headline items:
+
+1. `analyst-2026-04-10-003` — Economy cog `tree.on_error` bug (real stuck-user impact) — dispatch first
+2. `cal/paper-dynasty-umbrella#3` — Observability improvement (unblocks future debugging) — dispatch early
+3. 5 other approved items from the first run (3 features, 2 stability)
+4. 4 reshaped items that need additional spec work before dispatch
+
+## Why this matters
+
+This was a meta-accomplishment: building the tooling that builds the tooling. The pipeline is now a standing autonomous capability in the Paper Dynasty ecosystem. Cal's availability is no longer the bottleneck for routine stability fixes, small features, and dead-code cleanup. As confidence builds, the slot limits can rise, the budget can expand, and the scope can broaden.
+
+The first run also validated a deeper question: **can agents produce genuinely useful work without human guidance on what to build?** The answer, based on these 2 PRs, is yes — the pipeline caught a real correctness bug and a real dead-code pile that Cal had not flagged. That's the whole value proposition working on night one.
+
+## Next session pickup
+
+When resuming:
+
+1. Check status of `cal/paper-dynasty-discord#162` and `cal/paper-dynasty-database#211` — merged? closed? pending?
+2. Check status of `cal/paper-dynasty-umbrella#3` — has it been picked up?
+3. Decide: enable the systemd timer, or run another manual smoke test first
+4. If running another smoke test: expect ~$7-10 with the new config (analyst $2, growth-po $0.30, 2 POs × $0.30, 5 engineers × $0.80, 5 pr-reviewers × $0.30)
+5. See `project_autonomous_pipeline.md` and `project_autonomous_first_run.md` in memory for full context
+
+## References
+
+- Spec: `docs/superpowers/specs/2026-04-09-autonomous-improvement-pipeline-design.md`
+- Plan: `docs/superpowers/plans/2026-04-09-autonomous-improvement-pipeline.md`
+- Commit log: `git log --oneline --grep='autonomous'` in paper-dynasty-umbrella
+- First run digest: `autonomous-nightly-2026-04-10.md` (this same domain)
+- Live system: `/mnt/NV2/Development/paper-dynasty/autonomous/`