From 3b3487a43ffa323fdfcf7764d5afa5547cad5808 Mon Sep 17 00:00:00 2001 From: Cal Corum Date: Sun, 1 Mar 2026 09:58:58 -0600 Subject: [PATCH] store: Architecture: N8N Server Health Monitor uses Master Loop + Sub-workflow pattern --- ...lth-monitor-uses-master-loop-sub-67898e.md | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 graph/decisions/architecture-n8n-server-health-monitor-uses-master-loop-sub-67898e.md diff --git a/graph/decisions/architecture-n8n-server-health-monitor-uses-master-loop-sub-67898e.md b/graph/decisions/architecture-n8n-server-health-monitor-uses-master-loop-sub-67898e.md new file mode 100644 index 00000000000..3413d46e22c --- /dev/null +++ b/graph/decisions/architecture-n8n-server-health-monitor-uses-master-loop-sub-67898e.md @@ -0,0 +1,38 @@ +--- +id: 67898e52-470a-470e-b149-43fef0047ae9 +type: decision +title: "Architecture: N8N Server Health Monitor uses Master Loop + Sub-workflow pattern" +tags: [n8n, server-diagnostics, workflow-architecture, decision, claude-home] +importance: 0.65 +confidence: 0.8 +created: "2026-03-01T15:58:58.161120+00:00" +updated: "2026-03-01T15:58:58.161120+00:00" +--- + +# N8N Server Health Monitor Architecture + +## Workflow IDs +- **Master Loop**: `p7XmW23SgCs3hEkY` — runs every 5 minutes +- **Sub-workflow**: `BhzYmWr6NcIDoioy` — per-server health check + +## Flow +1. Master Loop SSHes to CT 300, fetches server list +2. Splits list into items, calls sub-workflow per server +3. Sub-workflow runs `health_check.py --server {key}`, parses exit code: + - `0` = healthy + - `1` = remediated + - `2` = escalation → runs `remediate.sh` +4. Master loop aggregates all results +5. Discord webhook fires **only if escalations found** (`has_escalations=true`) + +## Rationale for Split Design +- Sub-workflow: 10/10 successes, 0 errors — isolated per-server logic cleanly +- Master loop handles orchestration and notification separately +- Conditional Discord notify avoids alert spam on healthy runs + +## Notes +- Sub-workflow `onError` updates on SSH nodes applied without corruption (safe change) +- The escalation-only Discord path was the source of the silent Master Loop failure bug (missing URL) + +## Tags +n8n, server-diagnostics, workflow-architecture