feat: weekly Proxmox backup verification → Discord (#27) #48
No reviewers
Labels
No Label
ai-changes-requested
ai-failed
ai-pr-opened
ai-reviewed
ai-reviewing
ai-working
infra-audit
monitoring
operations
proxmox
script
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/claude-home#48
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "issue/27-set-up-weekly-proxmox-backup-verification-discord"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #27
Summary
proxmox-backup-check.sh— SSHes to Proxmox, queriespveshtask history, classifies each running VM/CT as 🟢/🟡/🔴 by backup recency (7-day default), and posts a Discord embed. Designed for weekly cron on CT 302 (Monday 08:00 UTC).ct302-self-health.sh— Checks disk usage on CT 302 itself; exits silently when healthy, posts a Discord alert when any filesystem hits ≥80%. Closes the blind spot where the monitoring system can't monitor itself externally. Runs daily (07:00 UTC).monitoring/scripts/CONTEXT.mdupdated with full operational docs, install instructions, and cron entry examples for both scripts.Deploy checklist
Design decisions
proxmox-backup-check.shalways posts to Discord (weekly status confirmation), not only on failure.ct302-self-health.shposts only on threshold breach to avoid noise (use--always-postfor weekly summaries if desired).jqandcurlon CT 302;pveshis invoked remotely via SSH.AI Code Review
Files Reviewed
monitoring/scripts/proxmox-backup-check.sh(added, 230 lines)monitoring/scripts/ct302-self-health.sh(added, 158 lines)monitoring/scripts/CONTEXT.md(modified)Findings
Correctness
⚠️
proxmox-backup-check.sh— Task field name needs verification (line 119)The task-matching filter uses
.vmidto correlate vzdump tasks to guests:The Proxmox
GET /nodes/{node}/tasksAPI returns items where the resource identifier is theidfield (a string containing the VMID), not a top-levelvmidfield. If.vmidisnullon every task object, every guest will have$last_ts = 0and show as 🔴 (no backup found), even when backups exist.Verify during dry-run: After deploying, run
--dry-runand confirm at least some guests appear as 🟢. If all guests show 🔴 unexpectedly, change.vmidto.idin theselect()filter.ℹ️
proxmox-backup-check.sh— Task limit may be insufficient for frequent backups (line 88)--limit 200fetches up to 200 vzdump tasks. For a homelab with ~20 guests on daily backup schedules, 200 tasks ≈ 10 days of history, which covers the 7-day window fine. For larger setups or shorter windows, consider bumping to--limit 500or making it configurable.ℹ️ Discord embed field value limit (both scripts)
Discord enforces a 1024-character limit on embed field values. If a large number of guests or filesystems are listed in a single field, the POST will return HTTP 400 and the webhook call will fail. Both scripts use
exit 1on non-2xx, so the failure will be logged. For the current homelab scale this is not an issue, but worth noting if guest count grows.Security
--discord-webhookCLI arg only. ✅jq -nwith typed--arg/--argjsoninputs — no string interpolation into JSON. ✅SSH_OPTSincludesBatchMode=yes(no interactive prompts) andStrictHostKeyChecking=accept-new(safe for internal networks — won't auto-accept changed host keys). ✅$PROXMOX_NODEis interpolated into the remote SSH command string. In a cron context with a trusted default value this is fine; would be a concern only if the env var were user-controlled. Acceptable for this deployment context.Style & Conventions
set -uo pipefail,log()timestamp function,--dry-runmode, env var overrides. ✅$# -lt 2checks) is solid.ct302-self-health.sh: Thedf --output=source,size,used,avail,pcent,target→awk '{print $1,$5,$6}'pipeline correctly producessource pct% mounttriples. The downstream$2/$3parsing is correct. ✅Suggestions
format_guestandformat_guest_with_agefunctions inproxmox-backup-check.share nearly identical. Could be merged into a single function with an optional age flag, but the current duplication is minimal and clear.ct302-self-health.sh, consider adding a brief--helpcase to the argument parser — makes manual invocation more discoverable.Verdict: COMMENT
Both scripts are well-structured and production-ready for a homelab context. The
.vmidvs.idtask field name is the one item that requires a real Proxmox test to confirm — the--dry-runverification step in the deploy checklist will catch it immediately. Everything else (security, argument handling, JSON construction, logging) is solid. Safe to merge; just run the dry-run on CT 302 before enabling the cron.Automated review by Claude PR Reviewer
Checkout
From your project repository, check out a new branch and test the changes.