docs: sync KB — pr-reviewer-two-identity-rollout.md
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 7s
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 7s
This commit is contained in:
parent
c5bafffb05
commit
d1a2af861f
102
scheduled-tasks/pr-reviewer-two-identity-rollout.md
Normal file
102
scheduled-tasks/pr-reviewer-two-identity-rollout.md
Normal file
@ -0,0 +1,102 @@
|
||||
---
|
||||
title: "Fix: pr-reviewer two-identity rollout — five cascading bugs fixed in Phase 2"
|
||||
description: "Rolling out the claude-reviewer bot for formal PR reviews surfaced five distinct bugs across scope gating, auth check compatibility, timeline parsing, MCP config layering, and prompt ambiguity — all fixed inline."
|
||||
type: troubleshooting
|
||||
domain: scheduled-tasks
|
||||
tags:
|
||||
- troubleshooting
|
||||
- pr-reviewer
|
||||
- gitea
|
||||
- mcp
|
||||
- bash
|
||||
- claude-scheduled
|
||||
- agent-prompt
|
||||
---
|
||||
|
||||
# Fix: pr-reviewer two-identity rollout — Phase 2 cascading bugs
|
||||
|
||||
**Date:** 2026-04-11
|
||||
**Spec:** `docs/superpowers/specs/2026-04-10-pr-review-process-design.md`
|
||||
**Plan:** `docs/superpowers/plans/2026-04-10-pr-review-process.md`
|
||||
**Severity:** High — pr-reviewer had been producing COMMENT-only verdicts for months; none of the formal review process actually worked end-to-end until these fixes landed.
|
||||
|
||||
## Problem
|
||||
|
||||
The pr-reviewer dispatcher was designed to post formal Gitea reviews (APPROVED / REQUEST_CHANGES / COMMENT) via the `create_pull_request_review` MCP tool, gated by branch protection. In practice every review came back as `COMMENT` with a "self-review restriction applies" disclaimer in the body. The approval gate was decorative — nothing was actually enforced.
|
||||
|
||||
Phase 1 of the rollout (creating a dedicated `claude-reviewer` Gitea bot and swapping the dispatcher's token) was expected to resolve this in one step. It didn't. Five distinct bugs surfaced once the new identity was wired in, each of which would have independently kept the system broken.
|
||||
|
||||
## Root Cause
|
||||
|
||||
Five compounding issues, in the order they were discovered:
|
||||
|
||||
### 1. `remove_label` mutated production during `--dry-run`
|
||||
|
||||
The new SHA-mismatch filter loop in `pr-reviewer-dispatcher.sh` called `remove_label` unconditionally when a PR's head SHA diverged from its last-reviewed SHA. There was no `$DRY_RUN` check. Running the dispatcher with `--dry-run` — which is supposed to be read-only — stripped `ai-reviewed` and `ai-changes-requested` labels from ~17 production PRs on its first invocation, because the new reviewer user had never reviewed any of them (empty `last_reviewed_sha`) and every PR looked like "head moved". The contract of `--dry-run` was broken.
|
||||
|
||||
### 2. `gitea_auth_check` required `read:user` scope
|
||||
|
||||
The existing `gitea_auth_check()` helper in `gitea-lib.sh` hit `GET /user` to validate the token. Gitea's `/user` endpoint requires the `read:user` scope. The new `claude-reviewer` PAT was created with `read:repository, write:issue, write:repository` — the minimum scopes the dispatcher actually needs — and rejected `/user` with HTTP 403. Consequence: the real (non-dry-run) dispatcher would exit immediately with "Gitea auth check failed" even though the token worked perfectly for every operation the dispatcher actually performs.
|
||||
|
||||
### 3. `reviewing_label_age` didn't distinguish label ADD from REMOVE events
|
||||
|
||||
The Task 7 helper that measures how long `ai-reviewing` has been attached walked Gitea's issue timeline looking for events with `type="label"` and `label.name="ai-reviewing"`. The initial spec parser treated every matched event as an add. Gitea's timeline encodes add vs remove in the event's `body` field (`"1"` = add, empty string `""` = remove), which the parser ignored. Result: a PR that had `ai-reviewing` attached and then cleanly removed via normal dispatcher flow would report a false positive stale age (tens of thousands of seconds), which would have triggered the 1-hour staleness timeout and force-reviewed PRs that were already correctly reviewed. Latent correctness bug that would have manifested the first time a real PR cycled through the label state.
|
||||
|
||||
### 4. MCP config had a hardcoded token separate from the dispatcher's secrets
|
||||
|
||||
The dispatcher's own Gitea API calls (for label management, PR discovery, filter helpers) authenticate via `gitea-lib.sh` which sources a secrets file. The **agent** that the dispatcher spawns via `claude -p` authenticates via its own **MCP server config** at `~/.config/claude-scheduled/tasks/pr-reviewer/mcp.json`, which had `GITEA_ACCESS_TOKEN` hardcoded in the `env` block of the `gitea-mcp` stdio server entry. Swapping the dispatcher's token via `SECRETS_FILE` was not enough — the agent itself kept authenticating as the old `Claude` identity. Every review the agent posted landed under `Claude`, triggering Gitea's self-approval block again, producing another `COMMENT` fallback. Two tokens, one clearly required, one hidden in a JSON file.
|
||||
|
||||
Complicating factor: the `generate-mcp-configs.sh` generator script reads from the default `secrets` file and substitutes `__GITEA_TOKEN__` into `mcp.json.template` files — but the pr-reviewer task had no `mcp.json.template`, only a hand-written `mcp.json`. So it had never gone through the generator, and the token had been hardcoded since task creation. Easy to patch directly but easy to miss.
|
||||
|
||||
### 5. Agent prompt left COMMENT as a safe middle ground
|
||||
|
||||
Once the MCP token was fixed and the agent was genuinely authenticating as `claude-reviewer` (a different identity from the PR author), a new production run still produced `verdict=COMMENT`. The review body said "No correctness issues. No security issues. Two minor suggestions, neither blocking." — yet the verdict was `COMMENT`, not `APPROVED`.
|
||||
|
||||
The agent prompt (`~/.claude/agents/pr-reviewer.md`) had a one-sided hard rule:
|
||||
|
||||
> If the Correctness section contains ANY issue → verdict MUST be REQUEST_CHANGES.
|
||||
|
||||
…but no symmetric rule forcing `APPROVED` when Correctness was clean. The verdict definitions gave the agent discretion:
|
||||
- **APPROVED** — "You may ONLY approve if there are ZERO correctness issues." (permissive threshold, not an obligation)
|
||||
- **COMMENT** — "Pure style observations or optional suggestions where the code is functionally correct." (also fits a clean-correctness-with-suggestions case)
|
||||
|
||||
A "functionally correct with non-blocking suggestions" review matched both definitions, and the agent consistently chose `COMMENT` as the safer middle ground. Because it was defaulting to COMMENT on clean reviews, the original "Gitea self-approval" symptom looked identical to this separate prompt ambiguity — masking the fact that fixing self-approval alone would not solve the problem.
|
||||
|
||||
## Fix
|
||||
|
||||
### 1. DRY_RUN gating in the filter loop
|
||||
|
||||
Wrapped the `remove_label` calls inside `pr-reviewer-dispatcher.sh`'s new filter block with a `$DRY_RUN` check. Dry-run still reports the intended action via a `DRY RUN: would re-review` log line but does not mutate Gitea state. Real runs behave as before.
|
||||
|
||||
### 2. `gitea_auth_check` switched to a repo endpoint
|
||||
|
||||
Changed the helper from `gitea_get "user"` to a direct curl against `GET /repos/search?limit=1&private=true`. The new endpoint requires authentication (so invalid tokens still fail) but only needs `read:repository` scope — which both the `Claude` and `claude-reviewer` tokens have. Backward compatible: issue-dispatcher's auth check keeps working with the `Claude` token.
|
||||
|
||||
### 3. `reviewing_label_age` walks events chronologically with state tracking
|
||||
|
||||
Rewrote the parser in the helper to walk timeline events in order. On each `ai-reviewing` label event, check `body`: `"1"` sets `last_add_ts` to that event's timestamp, empty string clears it to `None`. Only return a non-zero age if `last_add_ts` is still set after walking the full timeline — meaning the most recent add was never followed by a remove. Returns 0 for any PR where the label was normally attached-then-removed. Verified empirically against PR #215 in `paper-dynasty-database` which had multiple add/remove cycles.
|
||||
|
||||
### 4. Patched the MCP config token directly
|
||||
|
||||
Used `jq` to overwrite `mcpServers["gitea-mcp"].env.GITEA_ACCESS_TOKEN` in `~/.config/claude-scheduled/tasks/pr-reviewer/mcp.json` with the `claude-reviewer` PAT. No template to regenerate — the file is hand-written. Follow-up worth considering: create an `mcp.json.template` for pr-reviewer and extend `generate-mcp-configs.sh` to support per-task secrets files, so rotation is uniform across tasks.
|
||||
|
||||
### 5. Added symmetric hard rules to the reviewer prompt
|
||||
|
||||
Rewrote the verdict-selection block in `~/.claude/agents/pr-reviewer.md` to remove the discretion:
|
||||
|
||||
- **APPROVED** is now the default verdict when no correctness/security issues exist. Non-blocking suggestions go in the Suggestions section of an APPROVED review.
|
||||
- **COMMENT** is narrowed to purely informational reviews with no findings at all (explicitly NOT "functionally correct with suggestions").
|
||||
- Hard rules are symmetric: any correctness issue → MUST REQUEST_CHANGES; clean correctness + security → MUST APPROVED.
|
||||
- Explicit anti-pattern note: "COMMENT is NOT a safe middle ground. Choosing it when you have no correctness issues is wrong — use APPROVED."
|
||||
|
||||
Result: first real APPROVED verdict in the system's history landed on `paper-dynasty-database` PR #212 at 2026-04-11 20:06 UTC, commit `f9c25c56…`, cost $0.59, under the `claude-reviewer` identity.
|
||||
|
||||
## Lessons
|
||||
|
||||
- **"Dry-run" is a contract.** Any new code that mutates remote state inside a filter/dispatch loop MUST respect `$DRY_RUN`. When introducing helpers that call Gitea mutate endpoints (`remove_label`, `add_label`, `create_*`), wrap them in a dry-run check at the call site, not at the helper level. A shared helper can't know whether its caller considers itself to be in dry-run mode.
|
||||
- **Scope minimalism on tokens collides with "health check" endpoints that need broad scopes.** Auth-check helpers should hit the narrowest-scope endpoint that still forces authentication — `/repos/search?private=true` is a reasonable choice because a valid auth is required to see private results, but it doesn't need `read:user`. Avoid `/user` in any helper that might run under a minimum-scope bot token.
|
||||
- **Gitea timeline label events encode add/remove in an undocumented `body` field.** `body="1"` means add, empty string means remove. Any future timeline parser walking label events MUST distinguish these. This is not in the Gitea REST API docs — confirmed empirically on 2026-04-11 against repo `cal/paper-dynasty-database`.
|
||||
- **Agent identity is a two-layer problem when you spawn sub-agents via `claude -p`.** The dispatcher's own Gitea auth (via `gitea-lib.sh`) and the agent's MCP-server auth (via `mcp.json`) are independent. Swapping one does not swap the other. Any future identity rotation MUST update both layers, and the relationship should be documented at the top of `pr-reviewer-dispatcher.sh` for whoever touches it next.
|
||||
- **Hand-written `mcp.json` files escape the `generate-mcp-configs.sh` rotation workflow.** Hidden tokens in JSON are a rotation hazard. Prefer `mcp.json.template` even for single-task configs; extend the generator to support per-task secrets overrides if multiple identities are needed.
|
||||
- **Symmetric hard rules prevent agent discretion from drifting toward the safe middle.** When prompt rules are one-sided (e.g., "MUST reject on bugs" with no counterpart for clean reviews), LLMs will consistently default to the non-committal option (COMMENT) because it never triggers the negative rule. Any prompt that defines a three-way verdict must have hard rules in both directions OR explicitly forbid the middle ground when unwarranted.
|
||||
- **The observable symptom can outlive its root cause.** The "self-review restriction applies" disclaimer was the symptom of bug #4 (MCP token layering), but after #4 was fixed, the COMMENT verdict persisted due to bug #5 (prompt ambiguity). The fact that the symptom's TEXT disappeared after #4 but the VERDICT stayed the same was the signal that there was a second bug. Don't assume symptom resolution === root cause resolution — verify the actual output changed, not just the explanation.
|
||||
Loading…
Reference in New Issue
Block a user