docs: add CT 302 SSH alias and git auth details to server-diagnostics

Documents the claude-runner SSH alias, HTTPS token auth method, and notes that SSH git remotes don't work from CT 302. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 09:04:33 -06:00 · 2026-02-20 09:04:33 -06:00 · ed16fee9f7
commit ed16fee9f7
parent b75a09e86e
1 changed files with 91 additions and 0 deletions
--- a/monitoring/server-diagnostics/CONTEXT.md
+++ b/monitoring/server-diagnostics/CONTEXT.md
@ -0,0 +1,91 @@
+# Server Diagnostics — Deployment & Architecture
+
+## Overview
+
+Automated server health monitoring running on CT 302 (claude-runner, 10.10.0.148).
+Two-tier system: Python health checks handle 99% of issues autonomously; Claude
+is only invoked for complex failures that scripts can't resolve.
+
+## Architecture
+
+```
+┌──────────────────────┐     ┌──────────────────────────────────┐
+│  N8N (LXC 210)       │     │  CT 302 — claude-runner          │
+│  10.10.0.210         │     │  10.10.0.148                     │
+│                      │     │                                  │
+│  ┌─────────────────┐ │ SSH │  ┌──────────────────────────┐   │
+│  │ Cron: */15 min  │─┼─────┼─→│ health_check.py          │   │
+│  │                 │ │     │  │ (exit 0/1/2)             │   │
+│  │ Branch on exit: │ │     │  └──────────────────────────┘   │
+│  │  0 → stop       │ │     │                                  │
+│  │  1 → stop       │ │     │  ┌──────────────────────────┐   │
+│  │  2 → invoke     │─┼─────┼─→│ claude --print            │   │
+│  │     Claude      │ │     │  │ + client.py               │   │
+│  └─────────────────┘ │     │  └──────────────────────────┘   │
+│                      │     │                                  │
+│  ┌─────────────────┐ │     │  SSH keys:                       │
+│  │ Uptime Kuma     │ │     │  - homelab_rsa (→ target servers)│
+│  │ webhook trigger │ │     │  - n8n_runner_key (← N8N)        │
+│  └─────────────────┘ │     └──────────────────────────────────┘
+└──────────────────────┘
+         │ SSH to target servers
+         ▼
+┌────────────────┐  ┌────────────────┐  ┌────────────────┐
+│ arr-stack      │  │ gitea          │  │ uptime-kuma    │
+│ 10.10.0.221    │  │ 10.10.0.225    │  │ 10.10.0.227    │
+│ Docker: sonarr │  │ systemd: gitea │  │ Docker: kuma   │
+│ radarr, etc.   │  │ Docker: runner │  │                │
+└────────────────┘  └────────────────┘  └────────────────┘
+```
+
+## Cost Model
+
+- **Exit 0** (healthy): $0 — pure Python, no API call
+- **Exit 1** (auto-remediated): $0 — Python restarts container + Discord webhook
+- **Exit 2** (escalation): ~$0.10-0.15 — Claude Sonnet invoked via `claude --print`
+
+At 96 checks/day (every 15 min), typical cost is near $0 unless something
+actually breaks and can't be auto-fixed.
+
+## Repository
+
+**Gitea:** `cal/claude-runner-monitoring`
+**Deployed to:** `/root/.claude` on CT 302
+**SSH alias:** `claude-runner` (root@10.10.0.148, defined in `~/.ssh/config`)
+**Update method:** `ssh claude-runner "cd /root/.claude && git pull"`
+
+### Git Auth on CT 302
+
+CT 302 pushes to Gitea via HTTPS with a token auth header (embedded-credential URLs are rejected by Gitea). The token is stored locally in `~/.claude/secrets/claude_runner_monitoring_gitea_token` and configured on CT 302 via:
+
+```
+git config http.https://git.manticorum.com/.extraHeader 'Authorization: token <token>'
+```
+
+CT 302 does **not** have an SSH key registered with Gitea, so SSH git remotes won't work.
+
+## Files
+
+| File | Purpose |
+|------|---------|
+| `CLAUDE.md` | Runner-specific instructions for Claude |
+| `settings.json` | Locked-down permissions (read-only + restart only) |
+| `skills/server-diagnostics/health_check.py` | Tier 1: automated health checks |
+| `skills/server-diagnostics/client.py` | Tier 2: Claude's diagnostic toolkit |
+| `skills/server-diagnostics/notifier.py` | Discord webhook notifications |
+| `skills/server-diagnostics/config.yaml` | Server inventory + security rules |
+| `skills/server-diagnostics/SKILL.md` | Skill reference |
+| `skills/server-diagnostics/CLAUDE.md` | Remediation methodology |
+
+## Adding a New Server
+
+1. Add entry to `config.yaml` under `servers:` with hostname, containers, etc.
+2. Ensure CT 302 can SSH: `ssh -i /root/.ssh/homelab_rsa root@<ip> hostname`
+3. Commit to Gitea, pull on CT 302
+4. Add Uptime Kuma monitors if desired
+
+## Related
+
+- [monitoring/CONTEXT.md](../CONTEXT.md) — Overall monitoring architecture
+- [productivity/n8n/CONTEXT.md](../../productivity/n8n/CONTEXT.md) — N8N deployment
+- Uptime Kuma status page: https://status.manticorum.com