homelab-audit.sh: Fix variable interpolation and collector bugs #23
Labels
No Label
ai-changes-requested
ai-failed
ai-pr-opened
ai-reviewed
ai-reviewing
ai-working
infra-audit
monitoring
operations
proxmox
script
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/claude-home#23
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
SRE review of
monitoring/scripts/homelab-audit.shidentified several bugs in the collector script and variable handling.Bugs to Fix
1. STUCK_PROC_CPU_WARN not reaching the remote collector
The
COLLECTOR_SCRIPTheredoc is single-quoted (COLLECTOR_SCRIPT='...'), so$STUCK_PROC_CPU_WARNis never interpolated. The collector hardcodes10instead of using the configurable threshold.Fix options:
$for remote variablesecho "$COLLECTOR_SCRIPT" | sed "s/THRESHOLD_PLACEHOLDER/$STUCK_PROC_CPU_WARN/"2. LXC IP discovery unreliable for static-IP containers
lxc-info -n $ctid -iHonly works for containers using Proxmox-managed bridges with DHCP. Containers with static IPs set inside the container (not via Proxmox config) return no IP and are silently skipped.Fix: Fall back to parsing
pct config $ctid | grep "ip="for containers wherelxc-inforeturns empty.3. SSH failures silently dropped
2>/dev/nullonssh_cmdsuppresses all errors including host key changes and connection failures. A re-provisioned host silently disappears from the audit.Fix: Log SSH failures to
$REPORT_DIR/ssh-failures.logand include a count in the summary.4. set -uo pipefail comment
Add explicit comment:
# -e omitted intentionally — unreachable hosts should not abort the full auditFiles
monitoring/scripts/homelab-audit.shLabels
infra-audit,scriptPR opened: #34
Created
monitoring/scripts/homelab-audit.shwith all four fixes:$1to the remotebash -ssessionget_lxc_ip()trieslxc-infofirst, thenpct config … | grep -oP 'ip=…'for static-IP containers$REPORT_DIR/ssh-failures.log; failure count shown in summary# -e omitted intentionally — unreachable hosts should not abort the full audit