diff --git a/graph/solutions/health-check-escalation-logic-only-critical-severity-trigger-ad5a91.md b/graph/solutions/health-check-escalation-logic-only-critical-severity-trigger-ad5a91.md new file mode 100644 index 00000000000..1c450ae148a --- /dev/null +++ b/graph/solutions/health-check-escalation-logic-only-critical-severity-trigger-ad5a91.md @@ -0,0 +1,12 @@ +--- +id: ad5a911d-c7fa-4b7e-a488-17560c4067b3 +type: solution +title: "Health check escalation logic: only critical-severity triggers exit 2" +tags: [monitoring, claude-runner-monitoring, python, fix] +importance: 0.8 +confidence: 0.8 +created: "2026-02-20T03:59:38.027169+00:00" +updated: "2026-02-20T03:59:38.027169+00:00" +--- + +In health_check.py, the original logic put ALL non-auto-remediable issues into the escalations list, triggering exit 2 (Claude invocation) even for minor warnings like load_high. Fixed by only adding issues with `severity == "critical"` to escalations. Warnings are still included in the JSON output but don't trigger Claude escalation. This prevents unnecessary API costs from load spikes on LXCs sharing a Proxmox host (gitea and uptime-kuma both report the host's load, which often exceeds 2-core thresholds).