From f035fd93af2a13f6b331836c941ff90478a40e0c Mon Sep 17 00:00:00 2001 From: Cal Corum Date: Thu, 19 Feb 2026 22:31:58 -0600 Subject: [PATCH] store: Self-managing n8n server health monitor with sub-workflows --- ...ealth-monitor-with-sub-workflows-226839.md | 48 +++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 graph/procedures/self-managing-n8n-server-health-monitor-with-sub-workflows-226839.md diff --git a/graph/procedures/self-managing-n8n-server-health-monitor-with-sub-workflows-226839.md b/graph/procedures/self-managing-n8n-server-health-monitor-with-sub-workflows-226839.md new file mode 100644 index 00000000000..5d4a861209b --- /dev/null +++ b/graph/procedures/self-managing-n8n-server-health-monitor-with-sub-workflows-226839.md @@ -0,0 +1,48 @@ +--- +id: 2268393f-b90f-4ae8-9995-4b942ed2b2f7 +type: procedure +title: "Self-managing n8n server health monitor with sub-workflows" +tags: [n8n, homelab, monitoring, claude-runner, architecture, procedure] +importance: 0.9 +confidence: 0.8 +created: "2026-02-20T04:31:58.029736+00:00" +updated: "2026-02-20T04:31:58.029736+00:00" +--- + +## Architecture + +Master + sub-workflow pattern in n8n for server health monitoring via CT 302 (claude-runner at 10.10.0.148). + +### Master Workflow: "Server Health Monitor" (id: p7XmW23SgCs3hEkY) + +1. Schedule trigger (every 5 min) +2. SSH to CT 302 → run `list_servers.sh` to get server keys from config.yaml as JSON array +3. Code node: split array into items `[{server_key: "arr-stack"}, ...]` +4. Execute Sub-workflow (mode: "each item") → calls "Server Health Check" +5. Aggregate results (count healthy/remediated/escalated) +6. If any escalations → Discord summary embed + +### Sub-workflow: "Server Health Check" + +1. Execute Workflow Trigger — receives `{ server_key }` input +2. SSH to CT 302: `health_check.py --server {server_key}` +3. Parse JSON output (status/exit_code/issues/escalations) +4. If exit_code == 2 → SSH: `remediate.sh` with escalation data +5. Return results to parent + +### Key Design Decisions + +- **Server list from config.yaml** — single source of truth on CT 302. Adding a server = edit config.yaml + git pull. No n8n changes needed. +- **Exit code semantics:** 0=healthy, 1=auto-remediated (script already sent Discord), 2=needs Claude escalation +- **Discord:** Tier 1 alerts handled by notifier.py in health_check.py. Master only sends summary for escalations. +- **SSH chain:** n8n (10.10.0.210) → n8n_runner_key → CT 302 (10.10.0.148) → homelab_rsa → target servers +- **SSH credential:** "SSH Private Key account" (id: QkbHQ8JmYimUoTcM) — host 10.10.0.148, user root, n8n_runner_key (ed25519) + +### Files on CT 302 + +- `/root/.claude/skills/server-diagnostics/config.yaml` — server inventory +- `/root/.claude/skills/server-diagnostics/health_check.py` — health checker (Python, exit codes 0/1/2) +- `/root/.claude/skills/server-diagnostics/remediate.sh` — Claude CLI headless wrapper for escalation +- `/root/.claude/skills/server-diagnostics/list_servers.sh` — extracts server keys as JSON from config.yaml (to be created) +- `/root/.claude/skills/server-diagnostics/client.py` — SSH diagnostic toolkit for Claude during escalation +- `/root/.claude/skills/server-diagnostics/notifier.py` — Discord webhook notifications