3.1 KiB
3.1 KiB
| id | type | title | tags | importance | confidence | created | updated | relations | ||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2268393f-b90f-4ae8-9995-4b942ed2b2f7 | procedure | Self-managing n8n server health monitor with sub-workflows |
|
0.9 | 0.8 | 2026-02-20T04:31:58.029736+00:00 | 2026-03-01T15:58:58.516927+00:00 |
|
Architecture
Master + sub-workflow pattern in n8n for server health monitoring via CT 302 (claude-runner at 10.10.0.148).
Master Workflow: "Server Health Monitor" (id: p7XmW23SgCs3hEkY)
- Schedule trigger (every 5 min)
- SSH to CT 302 → run
list_servers.shto get server keys from config.yaml as JSON array - Code node: split array into items
[{server_key: "arr-stack"}, ...] - Execute Sub-workflow (mode: "each item") → calls "Server Health Check"
- Aggregate results (count healthy/remediated/escalated)
- If any escalations → Discord summary embed
Sub-workflow: "Server Health Check"
- Execute Workflow Trigger — receives
{ server_key }input - SSH to CT 302:
health_check.py --server {server_key} - Parse JSON output (status/exit_code/issues/escalations)
- If exit_code == 2 → SSH:
remediate.shwith escalation data - Return results to parent
Key Design Decisions
- Server list from config.yaml — single source of truth on CT 302. Adding a server = edit config.yaml + git pull. No n8n changes needed.
- Exit code semantics: 0=healthy, 1=auto-remediated (script already sent Discord), 2=needs Claude escalation
- Discord: Tier 1 alerts handled by notifier.py in health_check.py. Master only sends summary for escalations.
- SSH chain: n8n (10.10.0.210) → n8n_runner_key → CT 302 (10.10.0.148) → homelab_rsa → target servers
- SSH credential: "SSH Private Key account" (id: QkbHQ8JmYimUoTcM) — host 10.10.0.148, user root, n8n_runner_key (ed25519)
Files on CT 302
/root/.claude/skills/server-diagnostics/config.yaml— server inventory/root/.claude/skills/server-diagnostics/health_check.py— health checker (Python, exit codes 0/1/2)/root/.claude/skills/server-diagnostics/remediate.sh— Claude CLI headless wrapper for escalation/root/.claude/skills/server-diagnostics/list_servers.sh— extracts server keys as JSON from config.yaml (to be created)/root/.claude/skills/server-diagnostics/client.py— SSH diagnostic toolkit for Claude during escalation/root/.claude/skills/server-diagnostics/notifier.py— Discord webhook notifications