store: CT 302 health_check.py: HTTP probes, Docker health detection, warning accumulator

This commit is contained in:
Cal Corum 2026-02-20 00:35:19 -06:00
parent e2799e7159
commit 55d21c9595

View File

@ -0,0 +1,51 @@
---
id: 663f99a9-e31b-4a02-a678-9837345aebbb
type: solution
title: "CT 302 health_check.py: HTTP probes, Docker health detection, warning accumulator"
tags: [claude-runner, monitoring, health-check, homelab, docker, http-probes, discord]
importance: 0.7
confidence: 0.8
created: "2026-02-20T06:35:19.624994+00:00"
updated: "2026-02-20T06:35:19.624994+00:00"
---
# CT 302 Health Check Enhancements
## Context
Enhanced `health_check.py` on CT 302 (claude-runner, 10.10.0.148) with three new monitoring capabilities. Repo: `cal/claude-runner-monitoring` on Gitea.
## Capabilities Added
### 1. HTTP Health Probes
8 endpoints across 3 servers, run directly from CT 302 via `requests` library (no SSH hop):
**arr-stack:**
- Sonarr: `:8989/ping``{"status":"OK"}`
- Radarr: `:7878/ping``{"status":"OK"}`
- Readarr: `:8787/ping``{"status":"OK"}`
- Lidarr: `:8686/ping``{"status":"OK"}`
- Jellyseerr: `:5055/api/v1/status` → 200
- SABnzbd: `:8080/api?mode=version` → 200
**Gitea:** `:3000/api/v1/version` → 200
**Uptime Kuma:** `:3001/api/entry-page` → 200
Note: Prowlarr is NOT deployed on arr-stack (omitted from probes).
Catches: status code mismatches, timeouts, connection errors.
### 2. Docker Health/Restart Detection
Enhanced `docker ps` format string to include `{{.Status}}`. Detects:
- `(unhealthy)` containers — auto-remediable
- Uptime < 5 min indicates restart loop, NOT auto-remediable
Helper function `_uptime_seconds()` parses Docker status strings (e.g. "Up 3 minutes").
### 3. Warning Accumulator
Persistent `.warning_state.json` tracks consecutive warning counts per server. After 6 consecutive checks (30 min at 5-min interval), sends Discord digest via `send_warning_digest()` in `notifier.py`. Resets on clean run or after digest sent.
## Deployment
- Committed as `2b6e59a` on CT 302 (local only — Gitea token expired, see related memory)
- Venv at `/root/.claude/skills/server-diagnostics/.venv/` with `requests`, `pyyaml`
- Files deployed via `scp` to `/root/.claude/skills/server-diagnostics/`