fix: resolve variable interpolation and collector bugs in homelab-audit.sh #33
No reviewers
Labels
No Label
ai-changes-requested
ai-failed
ai-pr-opened
ai-reviewed
ai-reviewing
ai-working
infra-audit
monitoring
operations
proxmox
script
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/claude-home#33
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "bug/23-fix-audit-variable-interpolation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
STUCK_PROC_CPU_WARNas env var to remote bash sessions so the configurable threshold actually reaches the collector (was hardcoded10)pct configfallback for LXC IP discovery whenlxc-inforeturns empty (static-IP containers)$REPORT_DIR/ssh-failures.loginstead of/dev/null, with a count in the audit summaryset -eTest plan
bash -nsyntax check passesFixes #23
🤖 Generated with Claude Code
AI Code Review
Files Reviewed
monitoring/scripts/homelab-audit.sh(added — 483 lines)development/ace-step-local-network.md(added)development/subagent-write-permission-blocked.md(added)gaming/release-2026.4.02.md,gaming/troubleshooting.md, and 10 other.mdfiles (added — docs-only, no issues)The markdown files are documentation updates with no issues. All findings below are specific to
homelab-audit.sh.Findings
Correctness
Bug:
--outputwith missing argument causes unbound variable errorLine 46-47: When
--outputis the last argument with no filename following,$2is unbound. Withset -uo pipefailactive, the script crashes with a nondescript unbound variable error instead of a helpful usage message. Suggested fix:--json/OUTPUT_FORMATis declared and parsed but never usedLines 22 and 41-44:
OUTPUT_FORMATis set to"json"when--jsonis passed, butparse_and_reportand all output functions always produce text. The--jsonflag silently has no effect. Either implement JSON output or remove the flag from the argument parser and the usage comment in the header.ZOMBIE_WARNthreshold is defined but the comparison ignores itLine 35 defines
ZOMBIE_WARN=1. Line 317 compareszombie_countagainst hardcoded0instead of$ZOMBIE_WARN. If you want a configurable threshold:If the intent is to always warn on any zombie, remove
ZOMBIE_WARNfrom the config block to avoid confusion.now=$(date +%s)is dead code in the collector scriptLine 151 of
COLLECTOR_SCRIPT:nowis assigned but never referenced anywhere. This is leftover from an earlier version of the stuck-process check that usednow - start_timebefore switching toetimes. Should be removed.wc -lmay under-report multi-line SSH errorsLine 438: SSH connection errors can span multiple lines per failure event, so
wc -lcan over-count. The summary message"$ssh_fail_count error(s)"is potentially misleading. Simplest fix: just note that failures occurred and point to the log file, rather than showing a line count as an error count.Security
StrictHostKeyChecking=accept-newin both SSH helpersLines 63 and 70: This mode auto-accepts any new host key on first connection (TOFU). In a stable homelab with pre-configured SSH config aliases this is acceptable, but noted for completeness. If
known_hostsis pre-populated for all homelab hosts,StrictHostKeyChecking=yeswould be more secure.STUCK_PROC_CPU_WARNinjected into remote env — value is unquoted in the SSH command stringLines 171 and 370:
"STUCK_PROC_CPU_WARN=$STUCK_PROC_CPU_WARN bash -s"— the value is unquoted. Since this value is always an integer set at the top of the script, there is no real injection risk here. Quoting defensively ("STUCK_PROC_CPU_WARN='$STUCK_PROC_CPU_WARN' bash -s") costs nothing.Style & Conventions
Core fixes for issue #23 are correctly implemented:
STUCK_PROC_CPU_WARN=$STUCK_PROC_CPU_WARN bash -scorrectly injects the threshold into the remote bash environment, fixing the hardcoded-10bug. The${STUCK_PROC_CPU_WARN:-10}fallback inside the single-quoted heredoc then reads that injected env var. This is the correct pattern.pct configfallback for LXC IPs at line 419 correctly handles static-IP containers wherelxc-inforeturns empty. Thegrep -oP 'ip=\K[0-9.]+'correctly extracts the IP portion from a CIDR string likeip=10.10.0.5/24.$REPORT_DIR/ssh-failures.logand are surfaced in the summary. Correct.# -e omitted intentionallycomment at line 15 is a good addition.local -nrequires bash 4.3+ (line 382) — Fine for all Linux homelab hosts, worth noting if portability ever matters.grep -oP(PCRE) at line 419 — GNU grep only, fine on Linux.Suggestions
--help/-hflag that prints the usage line from the header comment.generate_summaryrecommendation list includes "Adjust monitoring thresholds to use per-core load metrics" (item 7), but the script already does per-core load analysis. That recommendation is stale.Verdict: REQUEST_CHANGES (for informational purposes — Gitea prevented a formal review since this is a self-review)
Three correctness issues should be fixed before merge:
--output FILEunbound variable crash when the filename argument is omitted--jsonflag silently has no effect — implement it or remove it from the interfaceZOMBIE_WARNis defined in the config block but the comparison uses hardcoded0, making the threshold non-functionalThe dead
nowvariable is a minor cleanup. Security observations are informational and are not blocking for a homelab script.Automated review by Claude PR Reviewer
AI Code Review
Files Reviewed
monitoring/scripts/homelab-audit.sh(added, 483 lines)Findings
Correctness
All four fixes described in the PR body are correctly implemented:
STUCK_PROC_CPU_WARNpropagation — TheCOLLECTOR_SCRIPTheredoc is correctly single-quoted, so the local variable doesn't expand into it. The threshold is passed as an environment variable when invoking remote bash sessions (STUCK_PROC_CPU_WARN=$STUCK_PROC_CPU_WARN bash -s), and the collector uses${STUCK_PROC_CPU_WARN:-10}to consume it with a sensible default. Correct.LXC IP discovery fallback —
lxc-infois tried first; if empty, falls back topct config ... | grep -oP 'ip=\K[0-9.]+'. The regex correctly stops at/so CIDR notation is handled. Correct.SSH failure logging — Both
ssh_cmdandssh_stdinredirect stderr to$REPORT_DIR/ssh-failures.log.generate_summarycounts lines in that file and emits a warning. Correct.set -ecomment — The explanation (# -e omitted intentionally — unreachable hosts should not abort the full audit) is clear and justified for this use case. Correct.One logic note:
ssh_fail_count=$(wc -l <"$REPORT_DIR/ssh-failures.log")counts lines, not events. A single SSH connection failure can produce multiple error lines, so the count may overstate actual failures. The label "error(s)" in the output is reasonable but slightly imprecise.Security
STUCK_PROC_CPU_WARNis a script-level integer constant, so passing it asSTUCK_PROC_CPU_WARN=$STUCK_PROC_CPU_WARN bash -sis safe (no injection risk).$ctidand$vmidsourced frompct list/qm liston a trusted Proxmox host — acceptable for homelab scope.StrictHostKeyChecking=accept-newis a reasonable tradeoff for automation on a private homelab.No security issues found.
Style & Conventions
bash-CONTEXT.mdrecommends#!/bin/bashas shebang; the script uses#!/usr/bin/env bash. Functionally equivalent, minor divergence.bash-CONTEXT.mdrecommendsset -euo pipefail; the script usesset -uo pipefailwith the-eomission explicitly justified in a comment. This is the correct call for a multi-host audit script.COLLECTOR_SCRIPT), parsing (parse_and_report), and presentation.Suggestions
ssh:orPermission deniedpatterns), or write one sentinel line per host in thecollect_guestfailure branch.python3dependency on Proxmox host: The VM IP discovery usespython3 -cfor JSON parsing. This works on Proxmox (Debian-based), but it's an undocumented dependency. A comment noting it would help.AUDIT_START/AUDIT_ENDmarkers are emitted by the collector but never validated byparse_and_report. If a remote script partially fails, the parse would silently proceed on incomplete data. Not a bug for the current scope, but worth a TODO comment.Verdict: COMMENT
The code is correct and well-written — the four bug fixes all land as described. However, the PR is currently not mergeable (
mergeable: falsefrom Gitea) due to a conflict with PR #34, which also createsmonitoring/scripts/homelab-audit.shas a fix for the same issue #23. One of these PRs should be closed before proceeding. Once the conflict is resolved, this code is ready to merge.Automated review by Claude PR Reviewer
Pull request closed