Closes#25
- check_backup_recency(): queries pvesh vzdump task history; flags VMs
with no backup (CRIT) or no backup in 7 days (WARN)
- check_cert_expiry(): probes ports 443/8443 per host via openssl;
flags certs expiring ≤14 days (WARN) or ≤7 days (CRIT)
- io_wait_pct() in COLLECTOR_SCRIPT: uses vmstat 1 2 to sample I/O
wait; flagged as WARN when > 20%
- OOM kill history was already collected via journalctl; no changes needed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move mem_limit/memswap_limit to deploy.resources.limits.memory so the
constraint is actually enforced under Compose v3. Add END clause to
swap_mb() so hosts without a Swap line report 0 instead of empty output.
Fix test script header comment accuracy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend homelab-audit.sh collector with zombie_parents(), swap_mb(), and
oom_events() functions so the audit identifies which process spawns zombies,
flags high swap usage, and reports recent OOM kills. Add init: true to both
Tdarr docker-compose services so tini reaps orphaned ffmpeg children, and
cap tdarr-node at 28g RAM / 30g total to prevent unbounded memory use.
Closes#30
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The awk program was double-quoted inside the single-quoted
COLLECTOR_SCRIPT, causing $1/$2/$3 to be expanded by the remote
shell as empty positional parameters instead of awk field references.
This made the D-state process filter silently match nothing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Validate --output-dir has a following argument before accessing $2
(prevents unbound variable crash under set -u)
- Add ZOMBIE_WARN config variable (default: 1) and use it in the zombie
check instead of hardcoding 0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#23
- Fix STUCK_PROC_CPU_WARN not reaching remote collector: COLLECTOR_SCRIPT
heredoc stays single-quoted; threshold is passed as $1 to the remote
bash session so it is evaluated correctly on the collecting host
- Fix LXC IP discovery for static-IP containers: lxc-info result now falls
back to parsing pct config when lxc-info returns empty
- Fix SSH failures silently dropped: stderr redirected to
$REPORT_DIR/ssh-failures.log; SSH_FAILURE entries counted and printed
in the summary
- Add explicit comment explaining why -e is omitted from set options
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>