docs: sync KB — pr-reviewer-ai-reviewing-label-stuck.md
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
87aeaf3309
commit
1ab0bf27ed
81
scheduled-tasks/pr-reviewer-ai-reviewing-label-stuck.md
Normal file
81
scheduled-tasks/pr-reviewer-ai-reviewing-label-stuck.md
Normal file
@ -0,0 +1,81 @@
|
||||
---
|
||||
title: "Fix: pr-reviewer leaving ai-reviewing label stuck on PRs"
|
||||
description: "Duplicate Gitea labels caused _get_label_id to SIGPIPE under pipefail, making remove_label silently bail and orphaning the ai-reviewing tag across 6 PRs."
|
||||
type: troubleshooting
|
||||
domain: scheduled-tasks
|
||||
tags:
|
||||
- troubleshooting
|
||||
- pr-reviewer
|
||||
- gitea
|
||||
- labels
|
||||
- bash
|
||||
- pipefail
|
||||
- claude-scheduled
|
||||
---
|
||||
|
||||
# Fix: pr-reviewer leaving `ai-reviewing` label stuck on PRs
|
||||
|
||||
**Date:** 2026-04-10
|
||||
**Severity:** Medium — pr-reviewer-dispatcher skipped any PR that already carried `ai-reviewing`, so stuck PRs were never re-reviewed. Six PRs across `major-domo-database` and `paper-dynasty-database` were silently wedged for weeks.
|
||||
|
||||
## Problem
|
||||
|
||||
Open PRs across the tracked repos accumulated the orange `ai-reviewing` label with no corresponding `ai-reviewed` / `ai-changes-requested` outcome. Because `pr-reviewer-dispatcher.sh` filters out any PR that already has one of those three labels, stuck PRs stayed invisible to future runs.
|
||||
|
||||
Two distinct stuck patterns were observable:
|
||||
|
||||
1. **Both labels attached** (`major-domo-database` #128, #124): the reviewer clearly ran to completion — `ai-reviewed` was added — but `ai-reviewing` was never removed.
|
||||
2. **Only `ai-reviewing` attached** (`paper-dynasty-database` #207, #126, #125; `major-domo-database` #122): no review outcome label at all. Looked like a mid-run crash.
|
||||
|
||||
## Root Cause
|
||||
|
||||
Two compounding bugs in `~/.config/claude-scheduled/gitea-lib.sh`.
|
||||
|
||||
### 1. Duplicate labels accumulated in repos
|
||||
|
||||
`ensure_label` had no de-duplication check. Any transient failure in `_get_label_id` (bad response, jq parse, pipeline issue) fell through and created a *new* label with the same name. Over time two `ai-reviewing` rows existed in both `major-domo-database` (ids 30, 31) and `paper-dynasty-database` (ids 60, 35); `paper-dynasty-discord` had the same issue with `ai-working`.
|
||||
|
||||
### 2. `_get_label_id` SIGPIPE under pipefail
|
||||
|
||||
The original helper was:
|
||||
|
||||
```bash
|
||||
_get_label_id() {
|
||||
gitea_get "repos/$owner/$repo/labels?limit=50" |
|
||||
jq -r --arg name "$name" '.[] | select(.name == $name) | .id' 2>/dev/null |
|
||||
head -1
|
||||
}
|
||||
```
|
||||
|
||||
The dispatcher runs under `set -euo pipefail`. With duplicate labels present, `jq` emits multiple id lines. `head -1` closes the pipe after the first line → `jq` hits SIGPIPE on the next write → pipeline exits non-zero → `pipefail` propagates. Inside `remove_label`:
|
||||
|
||||
```bash
|
||||
label_id=$(_get_label_id ...) || return 0
|
||||
```
|
||||
|
||||
…the `|| return 0` guard then **silently returned without ever calling DELETE**. The reviewer continued on and added `ai-reviewed`, leaving `ai-reviewing` orphaned. Same mechanism in the cleanup trap meant crashed runs also couldn't remove the label.
|
||||
|
||||
Additionally, even when the pipe didn't fire SIGPIPE, `remove_label` was resolving the label id against the *repo label catalog* rather than the labels actually attached to the PR — so for `paper-dynasty-database` #125 (which had id=35 attached), `head -1` returned id=60 and the DELETE was a no-op on an id that wasn't even there.
|
||||
|
||||
## Fix
|
||||
|
||||
**`gitea-lib.sh` hardened (three helpers):**
|
||||
|
||||
- **`_get_label_id`** — replaced `head -1` with `jq 'sort_by(.id) | .[0].id // empty'`. No pipeline truncation → no SIGPIPE. Also bumped `limit=50` → `limit=200` so large repos aren't silently truncated. On duplicates the *oldest* id is returned (the canonical row).
|
||||
- **`remove_label`** — now queries `repos/{o}/{r}/issues/{n}/labels` (labels actually attached to the PR), matches by name, and deletes every matching id. Can no longer DELETE the wrong id, and handles the theoretical case where both duplicates got attached.
|
||||
- **`ensure_label`** — counts existing labels with the target name before lookup, logs `WARNING: $repo has N labels named '$name' — reusing oldest` so the dispatcher log surfaces future dupes.
|
||||
|
||||
**Repo cleanup:**
|
||||
|
||||
- Cleared stale `ai-reviewing` from the 6 stuck PRs via the patched `remove_label`.
|
||||
- Deleted duplicate label rows (kept the oldest id in each repo): `major-domo-database` id 31, `paper-dynasty-database` id 60, `paper-dynasty-discord` id 52 (`ai-working`).
|
||||
- Swept all tracked repos for other `ai-*` label dupes — none remaining.
|
||||
|
||||
**Verification:** `bash -n`, then `pr-reviewer-dispatcher.sh --dry-run` — correctly re-discovered the 5 PRs that had only `ai-reviewing` (now clean) and properly skipped the 2 that already had `ai-reviewed`.
|
||||
|
||||
## Lessons
|
||||
|
||||
- **`set -o pipefail` + `head -N` is a foot-gun.** Whenever a downstream stage can close the pipe early, upstream producers will get SIGPIPE and fail the pipeline. Use `jq '.[0]'`, `awk 'NR==1{print; exit}'`, or read into a variable and slice — never `| head -1` in a pipefail script.
|
||||
- **Resolve label ids from the issue, not the repo catalog.** Gitea allows duplicate label names per repo. Any helper that maps a name → id from the repo catalog and then acts on an issue is ambiguous. Always query `issues/{n}/labels` when you need to mutate an attachment.
|
||||
- **"Get or create" helpers need a de-dup guard.** `ensure_label` should either tolerate duplicates by reusing the oldest (what we did) or hard-error and force a human to clean up; silently creating a new row on any transient failure accumulates garbage state over weeks.
|
||||
- **Skip-label dispatchers need a staleness timeout.** The dispatcher currently treats `ai-reviewing` as a permanent skip signal. A stuck label wedges a PR forever. Consider adding a timestamp check (e.g., `ai-reviewing` older than 1 hour → force re-review) as a belt-and-suspenders guard against future variants of this bug.
|
||||
Loading…
Reference in New Issue
Block a user