homelab-audit.sh: Add backup recency and certificate checks #25
Labels
No Label
ai-changes-requested
ai-failed
ai-pr-opened
ai-reviewed
ai-reviewing
ai-working
infra-audit
monitoring
operations
proxmox
script
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/claude-home#25
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
SRE review identified two critical audit gaps: no backup validation and no TLS certificate expiry checks. These are predictable failure modes that the audit should surface.
New Checks to Add
1. Proxmox backup recency
ssh proxmox "pvesh get /nodes/proxmox/tasks --typefilter vzdump --limit 50 --output-format json"2. Certificate expiration
echo | openssl s_client -connect $ip:443 2>/dev/null | openssl x509 -noout -enddate3. OOM kill history
dmesg | grep -i "oom-kill"on each host4. Disk I/O check
cat /proc/diskstatsdelta or simplevmstat 1 2to detect I/O waitFiles
monitoring/scripts/homelab-audit.shLabels
infra-audit,script,monitoringImplemented in PR #36.
Approach:
check_backup_recency(): runspvesh get /nodes/proxmox/tasks --typefilter vzdumplocally, uses Python to parse JSON and find the most recent successful backup per VM/CT — CRIT for no backup ever, WARN for no backup in 7 dayscheck_cert_expiry(): called per-host after SSH collection; probes ports 443 and 8443 viaopenssl s_client, skips silently if no HTTPS listener — WARN ≤14 days, CRIT ≤7 daysio_wait_pct(): added to the remote COLLECTOR_SCRIPT usingvmstat 1 2; flagged WARN if I/O wait > 20%oom_events()(journalctl kernel log, 7-day window) — no changes needed