Set up weekly Proxmox backup verification → Discord #27

Open
opened 2026-04-03 01:09:33 +00:00 by cal · 1 comment
Owner

Context

The infrastructure audit has no backup validation. A VM can be in a degraded state for 42 days without anyone checking if its backups are still running. This is the cheapest possible backup safety net.

Implementation

Create a weekly check (cron or n8n workflow) that:

  • Reads Proxmox backup job status via API: pvesh get /nodes/proxmox/tasks --typefilter vzdump --limit 50
  • For each running VM/CT, confirms a successful backup exists within the last 7 days
  • Posts a Discord summary:
    • Green: "All 16 VMs/CTs backed up within 7 days"
    • Yellow: "VM 116 last backup: 12 days ago"
    • Red: "VM 109 has NO backups"

Options

  1. n8n workflow — fits the existing monitoring pattern (n8n master/sub workflows)
  2. Cron script on claude-runner (CT 302) — simpler, independent of n8n
  3. Ansible playbook on LXC 304 — overkill for this

Recommend option 1 (n8n) for consistency with existing monitoring, with option 2 as fallback if n8n is down.

Bonus

  • Also add CT 302 self-health check: a cron on CT 302 that checks its own disk usage and posts to Discord if > 80%. This closes the monitoring blind spot where the monitoring system can't monitor itself.

Labels

infra-audit, monitoring, operations

## Context The infrastructure audit has no backup validation. A VM can be in a degraded state for 42 days without anyone checking if its backups are still running. This is the cheapest possible backup safety net. ## Implementation Create a weekly check (cron or n8n workflow) that: - [ ] Reads Proxmox backup job status via API: `pvesh get /nodes/proxmox/tasks --typefilter vzdump --limit 50` - [ ] For each running VM/CT, confirms a successful backup exists within the last 7 days - [ ] Posts a Discord summary: - Green: "All 16 VMs/CTs backed up within 7 days" - Yellow: "VM 116 last backup: 12 days ago" - Red: "VM 109 has NO backups" ## Options 1. **n8n workflow** — fits the existing monitoring pattern (n8n master/sub workflows) 2. **Cron script on claude-runner (CT 302)** — simpler, independent of n8n 3. **Ansible playbook on LXC 304** — overkill for this Recommend option 1 (n8n) for consistency with existing monitoring, with option 2 as fallback if n8n is down. ## Bonus - [ ] Also add CT 302 self-health check: a cron on CT 302 that checks its own disk usage and posts to Discord if > 80%. This closes the monitoring blind spot where the monitoring system can't monitor itself. ## Labels `infra-audit`, `monitoring`, `operations`
cal added the
infra-audit
monitoring
operations
labels 2026-04-03 01:10:21 +00:00
Claude added the
ai-working
label 2026-04-04 11:00:46 +00:00
Claude removed the
ai-working
label 2026-04-04 11:08:24 +00:00
Collaborator

PR opened: #48

Approach: Went with option 2 (cron scripts on CT 302) over n8n — keeps backup monitoring independent of n8n availability.

What's in the PR:

  • proxmox-backup-check.sh — SSHes to Proxmox, queries pvesh task history, classifies all running VMs/CTs as 🟢/🟡/🔴 by backup recency, posts a weekly Discord embed
  • ct302-self-health.sh — Checks CT 302's own disk usage, alerts Discord when any filesystem ≥80% (silent on healthy)
  • Updated monitoring/scripts/CONTEXT.md with install steps and cron entries

To deploy: copy both scripts to /root/scripts/ on CT 302, run with --dry-run to verify, then add to root crontab with your Discord webhook URL.

PR opened: https://git.manticorum.com/cal/claude-home/pulls/48 **Approach**: Went with option 2 (cron scripts on CT 302) over n8n — keeps backup monitoring independent of n8n availability. **What's in the PR**: - `proxmox-backup-check.sh` — SSHes to Proxmox, queries `pvesh` task history, classifies all running VMs/CTs as 🟢/🟡/🔴 by backup recency, posts a weekly Discord embed - `ct302-self-health.sh` — Checks CT 302's own disk usage, alerts Discord when any filesystem ≥80% (silent on healthy) - Updated `monitoring/scripts/CONTEXT.md` with install steps and cron entries **To deploy**: copy both scripts to `/root/scripts/` on CT 302, run with `--dry-run` to verify, then add to root crontab with your Discord webhook URL.
Claude added the
ai-pr-opened
label 2026-04-04 11:08:32 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/claude-home#27
No description provided.