claude-memory/self-managing-n8n-server-health-monitor-with-sub-workflows-226839.md at main

Cal Corum 8e54424564 daily sync: 3 added, 4 modified, 0 deleted (3 edges)

2026-03-01 09:58:58 -06:00

type

title

tags

importance

confidence

created

updated

relations

2268393f-b90f-4ae8-9995-4b942ed2b2f7

procedure

Self-managing n8n server health monitor with sub-workflows

n8n

homelab

monitoring

claude-runner

architecture

procedure

0.9

0.8

2026-02-20T04:31:58.029736+00:00

2026-03-01T15:58:58.516927+00:00

target	type	direction	strength	edge_id
7fdc5ceb-4b8c-426d-8492-948d106f92bb	BUILDS_ON	incoming	0.9	a9109127-1691-4cea-a957-8d55320281d7

target	type	direction	strength	edge_id
aab3d007-0cdf-4a4f-9b55-096ea4bdc168	RELATED_TO	incoming	0.85	effaafab-948b-488f-b388-4bd92f4ec6c2

target	type	direction	strength	edge_id
06101183-a78b-4852-86eb-cae5557ace8c	BUILDS_ON	incoming	0.85	02352349-eb84-4c09-8f2b-2e2feafd4f9a

target	type	direction	strength	edge_id
67898e52-470a-470e-b149-43fef0047ae9	RELATED_TO	incoming	0.81	f0735351-129b-4bf6-9919-e84eaffa9bcd

Architecture

Master + sub-workflow pattern in n8n for server health monitoring via CT 302 (claude-runner at 10.10.0.148).

Schedule trigger (every 5 min)
SSH to CT 302 → run list_servers.sh to get server keys from config.yaml as JSON array
Code node: split array into items [{server_key: "arr-stack"}, ...]
Execute Sub-workflow (mode: "each item") → calls "Server Health Check"
Aggregate results (count healthy/remediated/escalated)
If any escalations → Discord summary embed

Server list from config.yaml — single source of truth on CT 302. Adding a server = edit config.yaml + git pull. No n8n changes needed.
Exit code semantics: 0=healthy, 1=auto-remediated (script already sent Discord), 2=needs Claude escalation
Discord: Tier 1 alerts handled by notifier.py in health_check.py. Master only sends summary for escalations.
SSH chain: n8n (10.10.0.210) → n8n_runner_key → CT 302 (10.10.0.148) → homelab_rsa → target servers
SSH credential: "SSH Private Key account" (id: QkbHQ8JmYimUoTcM) — host 10.10.0.148, user root, n8n_runner_key (ed25519)

/root/.claude/skills/server-diagnostics/config.yaml — server inventory
/root/.claude/skills/server-diagnostics/health_check.py — health checker (Python, exit codes 0/1/2)
/root/.claude/skills/server-diagnostics/remediate.sh — Claude CLI headless wrapper for escalation
/root/.claude/skills/server-diagnostics/list_servers.sh — extracts server keys as JSON from config.yaml (to be created)
/root/.claude/skills/server-diagnostics/client.py — SSH diagnostic toolkit for Claude during escalation
/root/.claude/skills/server-diagnostics/notifier.py — Discord webhook notifications