store: CT 302 health_check.py: HTTP probes, Docker health detection, warning accumulator
This commit is contained in:
parent
e2799e7159
commit
55d21c9595
@ -0,0 +1,51 @@
|
||||
---
|
||||
id: 663f99a9-e31b-4a02-a678-9837345aebbb
|
||||
type: solution
|
||||
title: "CT 302 health_check.py: HTTP probes, Docker health detection, warning accumulator"
|
||||
tags: [claude-runner, monitoring, health-check, homelab, docker, http-probes, discord]
|
||||
importance: 0.7
|
||||
confidence: 0.8
|
||||
created: "2026-02-20T06:35:19.624994+00:00"
|
||||
updated: "2026-02-20T06:35:19.624994+00:00"
|
||||
---
|
||||
|
||||
# CT 302 Health Check Enhancements
|
||||
|
||||
## Context
|
||||
Enhanced `health_check.py` on CT 302 (claude-runner, 10.10.0.148) with three new monitoring capabilities. Repo: `cal/claude-runner-monitoring` on Gitea.
|
||||
|
||||
## Capabilities Added
|
||||
|
||||
### 1. HTTP Health Probes
|
||||
8 endpoints across 3 servers, run directly from CT 302 via `requests` library (no SSH hop):
|
||||
|
||||
**arr-stack:**
|
||||
- Sonarr: `:8989/ping` → `{"status":"OK"}`
|
||||
- Radarr: `:7878/ping` → `{"status":"OK"}`
|
||||
- Readarr: `:8787/ping` → `{"status":"OK"}`
|
||||
- Lidarr: `:8686/ping` → `{"status":"OK"}`
|
||||
- Jellyseerr: `:5055/api/v1/status` → 200
|
||||
- SABnzbd: `:8080/api?mode=version` → 200
|
||||
|
||||
**Gitea:** `:3000/api/v1/version` → 200
|
||||
|
||||
**Uptime Kuma:** `:3001/api/entry-page` → 200
|
||||
|
||||
Note: Prowlarr is NOT deployed on arr-stack (omitted from probes).
|
||||
|
||||
Catches: status code mismatches, timeouts, connection errors.
|
||||
|
||||
### 2. Docker Health/Restart Detection
|
||||
Enhanced `docker ps` format string to include `{{.Status}}`. Detects:
|
||||
- `(unhealthy)` containers — auto-remediable
|
||||
- Uptime < 5 min — indicates restart loop, NOT auto-remediable
|
||||
|
||||
Helper function `_uptime_seconds()` parses Docker status strings (e.g. "Up 3 minutes").
|
||||
|
||||
### 3. Warning Accumulator
|
||||
Persistent `.warning_state.json` tracks consecutive warning counts per server. After 6 consecutive checks (30 min at 5-min interval), sends Discord digest via `send_warning_digest()` in `notifier.py`. Resets on clean run or after digest sent.
|
||||
|
||||
## Deployment
|
||||
- Committed as `2b6e59a` on CT 302 (local only — Gitea token expired, see related memory)
|
||||
- Venv at `/root/.claude/skills/server-diagnostics/.venv/` with `requests`, `pyyaml`
|
||||
- Files deployed via `scp` to `/root/.claude/skills/server-diagnostics/`
|
||||
Loading…
Reference in New Issue
Block a user