diff --git a/graph/solutions/ct-302-health-checkpy-http-probes-docker-health-detection-wa-663f99.md b/graph/solutions/ct-302-health-checkpy-http-probes-docker-health-detection-wa-663f99.md new file mode 100644 index 00000000000..b96e2bd5ad9 --- /dev/null +++ b/graph/solutions/ct-302-health-checkpy-http-probes-docker-health-detection-wa-663f99.md @@ -0,0 +1,51 @@ +--- +id: 663f99a9-e31b-4a02-a678-9837345aebbb +type: solution +title: "CT 302 health_check.py: HTTP probes, Docker health detection, warning accumulator" +tags: [claude-runner, monitoring, health-check, homelab, docker, http-probes, discord] +importance: 0.7 +confidence: 0.8 +created: "2026-02-20T06:35:19.624994+00:00" +updated: "2026-02-20T06:35:19.624994+00:00" +--- + +# CT 302 Health Check Enhancements + +## Context +Enhanced `health_check.py` on CT 302 (claude-runner, 10.10.0.148) with three new monitoring capabilities. Repo: `cal/claude-runner-monitoring` on Gitea. + +## Capabilities Added + +### 1. HTTP Health Probes +8 endpoints across 3 servers, run directly from CT 302 via `requests` library (no SSH hop): + +**arr-stack:** +- Sonarr: `:8989/ping` → `{"status":"OK"}` +- Radarr: `:7878/ping` → `{"status":"OK"}` +- Readarr: `:8787/ping` → `{"status":"OK"}` +- Lidarr: `:8686/ping` → `{"status":"OK"}` +- Jellyseerr: `:5055/api/v1/status` → 200 +- SABnzbd: `:8080/api?mode=version` → 200 + +**Gitea:** `:3000/api/v1/version` → 200 + +**Uptime Kuma:** `:3001/api/entry-page` → 200 + +Note: Prowlarr is NOT deployed on arr-stack (omitted from probes). + +Catches: status code mismatches, timeouts, connection errors. + +### 2. Docker Health/Restart Detection +Enhanced `docker ps` format string to include `{{.Status}}`. Detects: +- `(unhealthy)` containers — auto-remediable +- Uptime < 5 min — indicates restart loop, NOT auto-remediable + +Helper function `_uptime_seconds()` parses Docker status strings (e.g. "Up 3 minutes"). + +### 3. Warning Accumulator +Persistent `.warning_state.json` tracks consecutive warning counts per server. After 6 consecutive checks (30 min at 5-min interval), sends Discord digest via `send_warning_digest()` in `notifier.py`. Resets on clean run or after digest sent. + +## Deployment +- Committed as `2b6e59a` on CT 302 (local only — Gitea token expired, see related memory) +- Venv at `/root/.claude/skills/server-diagnostics/.venv/` with `requests`, `pyyaml` +- Files deployed via `scp` to `/root/.claude/skills/server-diagnostics/`