fix: homelab-audit.sh variable interpolation and collector bugs (#23) #34

Claude · 2026-04-03T01:34:13Z

Claude commented

2026-04-03 01:34:13 +00:00

Closes #23

Changes

Creates monitoring/scripts/homelab-audit.sh with all four bugs from the SRE review addressed:

1. STUCK_PROC_CPU_WARN threshold now reaches the remote collector

COLLECTOR_SCRIPT heredoc is kept single-quoted (no local variable leakage). Instead, STUCK_PROC_CPU_WARN is passed as $1 when invoking the remote bash session:

echo "$COLLECTOR_SCRIPT" | ssh_cmd "$addr" bash -s -- "$STUCK_PROC_CPU_WARN"

The collector receives it as STUCK_PROC_CPU_WARN="${1:-10}".

2. LXC IP discovery with static-IP fallback

get_lxc_ip() first tries lxc-info -n "$ctid" -iH. If that returns empty, it falls back to:

pct config "$ctid" | grep -oP '(?<=ip=)[^/,]+'

3. SSH failures logged, not silently dropped

ssh_cmd() redirects stderr to $REPORT_DIR/ssh-failures.log instead of /dev/null. SSH failures are counted and printed in the audit summary. A SSH_FAILURE sentinel line is written per failed host.

4. Explicit set options comment

# -e omitted intentionally — unreachable hosts should not abort the full audit
set -uo pipefail

Files Changed

monitoring/scripts/homelab-audit.sh (new file, 268 lines)

Closes #23 ## Changes Creates `monitoring/scripts/homelab-audit.sh` with all four bugs from the SRE review addressed: ### 1. STUCK_PROC_CPU_WARN threshold now reaches the remote collector `COLLECTOR_SCRIPT` heredoc is kept single-quoted (no local variable leakage). Instead, `STUCK_PROC_CPU_WARN` is passed as `$1` when invoking the remote bash session: ```bash echo "$COLLECTOR_SCRIPT" | ssh_cmd "$addr" bash -s -- "$STUCK_PROC_CPU_WARN" ``` The collector receives it as `STUCK_PROC_CPU_WARN="${1:-10}"`. ### 2. LXC IP discovery with static-IP fallback `get_lxc_ip()` first tries `lxc-info -n "$ctid" -iH`. If that returns empty, it falls back to: ```bash pct config "$ctid" | grep -oP '(?<=ip=)[^/,]+' ``` ### 3. SSH failures logged, not silently dropped `ssh_cmd()` redirects stderr to `$REPORT_DIR/ssh-failures.log` instead of `/dev/null`. SSH failures are counted and printed in the audit summary. A `SSH_FAILURE` sentinel line is written per failed host. ### 4. Explicit set options comment ```bash # -e omitted intentionally — unreachable hosts should not abort the full audit set -uo pipefail ``` ## Files Changed - `monitoring/scripts/homelab-audit.sh` (new file, 268 lines)

Claude added 16 commits 2026-04-03 01:34:14 +00:00

docs: sync KB — open-packs-checkin-crash.md 57a2618d44

docs: sync KB — codex-agents-marketplace.md 55e7f2604d

docs: sync KB — subagent-write-permission-blocked.md,release-2026.3.28.md 21dc20b9ab

docs: sync KB — 2026-03-30.md b057c5b4a6

docs: sync KB — release-2026.3.31.md afa179c6b7

docs: sync KB — release-2026.3.31.md b97d5ede2e

docs: sync KB — release-2026.3.31-2.md bdd500bcdf

docs: sync KB — troubleshooting.md 02bda7bdd5

docs: sync KB — troubleshooting.md dcefc0f3d1

docs: sync KB — release-2026.3.31.md,release-2026.4.01.md 0f59e77cf2

docs: sync KB — troubleshooting-gunicorn-worker-timeouts.md 43421e2cf1

docs: sync KB — database-release-2026.4.1.md 0c44dec1f0

docs: sync KB — release-2026.4.02.md b43cc36ff1

docs: sync KB — ace-step-local-network.md e3c1f20abf

docs: sync KB — troubleshooting.md 5ee48af40f

fix: add homelab-audit.sh with variable interpolation and collector fixes (#23 )

Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 3s

Details

312434aa95

Closes #23

- Fix STUCK_PROC_CPU_WARN not reaching remote collector: COLLECTOR_SCRIPT
  heredoc stays single-quoted; threshold is passed as $1 to the remote
  bash session so it is evaluated correctly on the collecting host
- Fix LXC IP discovery for static-IP containers: lxc-info result now falls
  back to parsing pct config when lxc-info returns empty
- Fix SSH failures silently dropped: stderr redirected to
  $REPORT_DIR/ssh-failures.log; SSH_FAILURE entries counted and printed
  in the summary
- Add explicit comment explaining why -e is omitted from set options

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Claude referenced this pull request

2026-04-03 01:34:25 +00:00

homelab-audit.sh: Fix variable interpolation and collector bugs #23

cal added 1 commit 2026-04-03 01:41:46 +00:00

fix: guard --output-dir arg and use configurable ZOMBIE_WARN threshold

Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 3s

Details

c0b164b2f7

- Validate --output-dir has a following argument before accessing $2
  (prevents unbound variable crash under set -u)
- Add ZOMBIE_WARN config variable (default: 1) and use it in the zombie
  check instead of hardcoding 0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cal requested changes 2026-04-03 01:43:11 +00:00

Dismissed

cal left a comment

AI Code Review

Files Reviewed

monitoring/scripts/homelab-audit.sh (added, 268 lines)

Second-Round Fixes Confirmed

Both additions from the first review round are correctly implemented:

--output-dir guard: the $# -lt 2 check correctly detects a missing argument before shift 2 would underrun the argument count.
ZOMBIE_WARN=1 default and ((zombies >= ZOMBIE_WARN)) comparison: correct substitution of the hardcoded > 0 — the configurable threshold is now respected.

Findings

Correctness

Bug: awk field references inside single-quoted heredoc use unquoted $ — they will not reach the remote shell correctly.

Inside COLLECTOR_SCRIPT (single-quoted), the stuck_procs() function contains:

awk -v t="$STUCK_PROC_CPU_WARN" "$1 ~ /^D/ && $2+0 >= t+0 {print $3}"

Because the heredoc is single-quoted, the literal text $1, $2, $3 is transmitted to the remote shell verbatim, which is the desired behaviour for the awk positional fields. However the awk program string itself is passed as a double-quoted shell argument on the remote side, meaning the remote shell will try to expand $1, $2, $3 as shell positional parameters before awk sees them — and since the remote bash invocation has only $1 = STUCK_PROC_CPU_WARN value, $2 and $3 will expand to empty strings, making the awk filter always produce no output.

The fix is to single-quote the awk program on the remote side:

awk -v t="$STUCK_PROC_CPU_WARN" '$1 ~ /^D/ && $2+0 >= t+0 {print $3}'

But because COLLECTOR_SCRIPT is itself single-quoted, inner single quotes cannot be used directly. The standard workaround is '\'' (end single quote, literal ', resume single quote) or using $'...' quoting for the outer variable. This is a functional bug: stuck_procs() will always return empty output.

Minor: host_count increment uses || true to suppress errexit, but set -e is already omitted.

((host_count++)) || true — the || true guard is necessary when set -e is active because ((expr)) exits with code 1 when the expression evaluates to 0 (i.e. on the first increment from 0). Since set -e is intentionally omitted here, the guard is harmless but slightly misleading. Not a bug, just worth noting.

Minor: generate_summary sorts $FINDINGS_FILE directly — if the file is large, this is fine, but sort is called without -o so it reads from the file while potentially still being written to by a concurrent parse_and_report call. In practice main() calls these sequentially so there is no race, but the ordering assumption is implicit.

Security

SSH_OPTS uses StrictHostKeyChecking=accept-new, which silently accepts new host keys but will refuse changed ones. This is a reasonable posture for a homelab audit tool against a known-stable inventory, but it is worth documenting the trade-off (an attacker who can intercept a first-time connection could MITM it). No change required — just an observation.
No hardcoded credentials. SSH_USER defaults to root, which is documented.
The python3 snippet inside collect_inventory uses json.load(sys.stdin) on data from qm guest cmd — this is trusted Proxmox tooling output, not user-supplied input, so injection risk is negligible.
REPORT_DIR is created with mkdir -p without a chmod; it inherits the umask. SSH failure logs and findings may contain hostnames and IPs. For a root-run audit script this is acceptable, but worth knowing.

Style & Conventions

Shebang is #!/usr/bin/env bash �� correct per convention.
set -uo pipefail with an explanatory comment for the omitted -e — good practice, matches the bash-CONTEXT.md guidance.
All variables are quoted throughout. Named constants (DISK_WARN, DISK_CRIT, etc.) are grouped at the top.
shellcheck disable=SC2086 is used narrowly and correctly for the intentionally unquoted $SSH_OPTS.
The COLLECTOR_SCRIPT single-quoting strategy and rationale are clearly commented.
parse_and_report uses IFS= read -r line correctly.

Summary of Required Fix

The awk program string in stuck_procs() inside COLLECTOR_SCRIPT uses double-quoted shell argument syntax on the remote, causing $1/$2/$3 to be expanded as shell positionals instead of awk field references. stuck_procs() will always produce empty output. The awk program needs to be single-quoted on the remote side; this requires escaping within the outer single-quoted variable (e.g., '\'' quoting or restructuring).

Verdict: REQUEST_CHANGES

One functional bug blocks approval: the awk field references in stuck_procs() are subject to remote shell expansion, causing the D-state process detector to silently produce no output regardless of actual stuck processes. This directly undermines the fix described in PR item #1 (the STUCK_PROC_CPU_WARN threshold). All other aspects of the implementation are solid.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `monitoring/scripts/homelab-audit.sh` (added, 268 lines) ### Second-Round Fixes Confirmed Both additions from the first review round are correctly implemented: - `--output-dir` guard: the `$# -lt 2` check correctly detects a missing argument before `shift 2` would underrun the argument count. - `ZOMBIE_WARN=1` default and `((zombies >= ZOMBIE_WARN))` comparison: correct substitution of the hardcoded `> 0` — the configurable threshold is now respected. ### Findings #### Correctness **Bug: `awk` field references inside single-quoted heredoc use unquoted `$` — they will not reach the remote shell correctly.** Inside `COLLECTOR_SCRIPT` (single-quoted), the `stuck_procs()` function contains: ```bash awk -v t="$STUCK_PROC_CPU_WARN" "$1 ~ /^D/ && $2+0 >= t+0 {print $3}" ``` Because the heredoc is single-quoted, the literal text `$1`, `$2`, `$3` is transmitted to the remote shell verbatim, which is the desired behaviour for the awk positional fields. However the awk program string itself is passed as a double-quoted shell argument on the remote side, meaning the remote shell will try to expand `$1`, `$2`, `$3` as shell positional parameters *before* awk sees them — and since the remote bash invocation has only `$1` = `STUCK_PROC_CPU_WARN value`, `$2` and `$3` will expand to empty strings, making the awk filter always produce no output. The fix is to single-quote the awk program on the remote side: ```bash awk -v t="$STUCK_PROC_CPU_WARN" '$1 ~ /^D/ && $2+0 >= t+0 {print $3}' ``` But because `COLLECTOR_SCRIPT` is itself single-quoted, inner single quotes cannot be used directly. The standard workaround is `'\''` (end single quote, literal `'`, resume single quote) or using `$'...'` quoting for the outer variable. This is a functional bug: `stuck_procs()` will always return empty output. **Minor: `host_count` increment uses `|| true` to suppress errexit, but `set -e` is already omitted.** `((host_count++)) || true` — the `|| true` guard is necessary when `set -e` is active because `((expr))` exits with code 1 when the expression evaluates to 0 (i.e. on the first increment from 0). Since `set -e` is intentionally omitted here, the guard is harmless but slightly misleading. Not a bug, just worth noting. **Minor: `generate_summary` sorts `$FINDINGS_FILE` directly — if the file is large, this is fine, but `sort` is called without `-o` so it reads from the file while potentially still being written to by a concurrent `parse_and_report` call.** In practice `main()` calls these sequentially so there is no race, but the ordering assumption is implicit. #### Security - `SSH_OPTS` uses `StrictHostKeyChecking=accept-new`, which silently accepts new host keys but will refuse changed ones. This is a reasonable posture for a homelab audit tool against a known-stable inventory, but it is worth documenting the trade-off (an attacker who can intercept a first-time connection could MITM it). No change required — just an observation. - No hardcoded credentials. `SSH_USER` defaults to `root`, which is documented. - The `python3` snippet inside `collect_inventory` uses `json.load(sys.stdin)` on data from `qm guest cmd` — this is trusted Proxmox tooling output, not user-supplied input, so injection risk is negligible. - `REPORT_DIR` is created with `mkdir -p` without a `chmod`; it inherits the umask. SSH failure logs and findings may contain hostnames and IPs. For a root-run audit script this is acceptable, but worth knowing. #### Style & Conventions - Shebang is `#!/usr/bin/env bash` �� correct per convention. - `set -uo pipefail` with an explanatory comment for the omitted `-e` — good practice, matches the bash-CONTEXT.md guidance. - All variables are quoted throughout. Named constants (`DISK_WARN`, `DISK_CRIT`, etc.) are grouped at the top. - `shellcheck disable=SC2086` is used narrowly and correctly for the intentionally unquoted `$SSH_OPTS`. - The `COLLECTOR_SCRIPT` single-quoting strategy and rationale are clearly commented. - `parse_and_report` uses `IFS= read -r line` correctly. ### Summary of Required Fix The `awk` program string in `stuck_procs()` inside `COLLECTOR_SCRIPT` uses double-quoted shell argument syntax on the remote, causing `$1`/`$2`/`$3` to be expanded as shell positionals instead of awk field references. `stuck_procs()` will always produce empty output. The awk program needs to be single-quoted on the remote side; this requires escaping within the outer single-quoted variable (e.g., `'\''` quoting or restructuring). ### Verdict: REQUEST_CHANGES One functional bug blocks approval: the `awk` field references in `stuck_procs()` are subject to remote shell expansion, causing the D-state process detector to silently produce no output regardless of actual stuck processes. This directly undermines the fix described in PR item #1 (the `STUCK_PROC_CPU_WARN` threshold). All other aspects of the implementation are solid. --- *Automated review by Claude PR Reviewer*

monitoring/scripts/homelab-audit.sh Outdated

						
				@ -0,0 +71,4 @@

				disk_usage() {

				  df --output=pcent,target -x tmpfs -x devtmpfs 2>/dev/null | tail -n +2 | \

				    while read -r pct mnt; do echo "${pct%%%} $mnt"; done

				}

cal commented

2026-04-03 01:43:11 +00:00

The awk program is double-quoted on the remote side. When bash -s -- "$STUCK_PROC_CPU_WARN" runs on the remote host, the remote shell will expand $1, $2, $3 as its own positional parameters before passing the string to awk. Since only $1 has a value (the CPU threshold), $2 and $3 expand to empty string — the awk stat/pcpu/comm field references vanish and the filter never matches anything.

Fix: single-quote the awk program on the remote side. Since COLLECTOR_SCRIPT is already outer-single-quoted, use '\'' to embed a literal single quote:

stuck_procs() {
  ps -eo stat=,pcpu=,comm= | \
    awk -v t="$STUCK_PROC_CPU_WARN" '\''$1 ~ /^D/ && $2+0 >= t+0 {print $3}'\'' | \
    paste -sd,
}

The awk program is double-quoted on the remote side. When `bash -s -- "$STUCK_PROC_CPU_WARN"` runs on the remote host, the remote shell will expand `$1`, `$2`, `$3` as its own positional parameters before passing the string to awk. Since only `$1` has a value (the CPU threshold), `$2` and `$3` expand to empty string — the awk stat/pcpu/comm field references vanish and the filter never matches anything. Fix: single-quote the awk program on the remote side. Since `COLLECTOR_SCRIPT` is already outer-single-quoted, use `'\''` to embed a literal single quote: ```bash stuck_procs() { ps -eo stat=,pcpu=,comm= | \ awk -v t="$STUCK_PROC_CPU_WARN" '\''$1 ~ /^D/ && $2+0 >= t+0 {print $3}'\'' | \ paste -sd, } ```

cal added 1 commit 2026-04-03 01:43:52 +00:00

fix: single-quote awk program in stuck_procs() collector

Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 2s

Details

e15f88975f

The awk program was double-quoted inside the single-quoted
COLLECTOR_SCRIPT, causing $1/$2/$3 to be expanded by the remote
shell as empty positional parameters instead of awk field references.
This made the D-state process filter silently match nothing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Claude added the

ai-reviewing

label 2026-04-03 01:45:08 +00:00

cal approved these changes 2026-04-03 01:46:33 +00:00

cal left a comment

AI Code Review (Round 3)

Files Reviewed

monitoring/scripts/homelab-audit.sh (added, 273 lines)

Verification of Round 1-2-3 Fixes

Fix 1 — `--output-dir` guard (Round 1)

Correct. Lines 35-38 check $# -lt 2 immediately after matching --output-dir, before any shift. Calling homelab-audit.sh --output-dir with no argument correctly prints the error and exits 1.

Fix 2 — `ZOMBIE_WARN` configurable (Round 2)

Correct per the design intent. ZOMBIE_WARN=1 on line 30 is a named constant, consistent with DISK_WARN=80, DISK_CRIT=90, LOAD_WARN=2.0, and MEM_WARN=85 — none of which use the env-override pattern. The three variables documented in the header (STUCK_PROC_CPU_WARN, REPORT_DIR, SSH_USER) are the only runtime env overrides. Using a named constant is the right call here; it avoids cluttering the env-override interface with thresholds that rarely need per-run tuning.

Fix 3 — `awk` single-quoting in `stuck_procs()` (Round 3)

Correct. The '\'' escape sequence inside the single-quoted COLLECTOR_SCRIPT string produces a literal single quote in the stored script text. The remote shell therefore receives:

awk -v t="$STUCK_PROC_CPU_WARN" '$1 ~ /^D/ && $2+0 >= t+0 {print $3}'

The awk program is single-quoted, so awk field references $1, $2, $3 are never interpolated by the remote shell. The -v t= binding correctly passes STUCK_PROC_CPU_WARN from the remote variable (set from $1 at the top of the collector script). This is the right fix.

Findings

Correctness

One minor issue remains: $SSH_FAILURES_LOG is never truncated at startup, but $FINDINGS_FILE is (line 261: >"$FINDINGS_FILE"). If --output-dir points to a reused directory, stale SSH failure sentinel lines from a prior run will accumulate in ssh-failures.log. The summary's ssh_failure_count will therefore overcount on reruns. The default REPORT_DIR uses a timestamp so this only affects the --output-dir path, and the impact is limited to the summary counter being wrong. This is not a blocker, but worth noting.

The awk programs in cpu_load() and mem_pct() use double-quoted programs with \$N (e.g., "{print \$2}"). Inside the single-quoted COLLECTOR_SCRIPT, these literal characters reach the remote bash verbatim. On the remote, the double-quoted awk program has \$2 which bash reduces to a literal $2 before passing it to awk. This is correct behavior.

The awk "BEGIN{exit !($load > $LOAD_WARN)}" on line 175 interpolates $load (which comes from remote host output) directly into an awk program. A compromised host could inject awk code via a crafted CPU_LOAD= line. For a homelab auditing trusted hosts this is an acceptable risk, but worth noting for anyone adapting this script to untrusted networks.

Security

No secrets, hardcoded credentials, or data exposure issues. SSH_OPTS uses BatchMode=yes (prevents password prompts), StrictHostKeyChecking=accept-new (first-connect trust, no TOFU bypass), and ConnectTimeout=10 (avoids hangs). SSH stderr goes to a log file rather than being silently swallowed — this was the Round 2 fix and it's implemented correctly.

Style and Conventions

Matches project bash conventions: set -uo pipefail, functions with local variables, consistent >>"$FINDINGS_FILE" append patterns, ((expr)) || true for arithmetic in non--e context, and thorough section comments. The # shellcheck disable=SC2086 on line 101 is justified by the intentional word-splitting of $SSH_OPTS.

Suggestions

Consider adding >"$SSH_FAILURES_LOG" alongside >"$FINDINGS_FILE" in main() to ensure reruns with --output-dir produce a clean log. One line fix.
The usage comment at the top documents [--output-dir DIR] but does not mention that ZOMBIE_WARN, DISK_WARN, DISK_CRIT, LOAD_WARN, and MEM_WARN can be changed by editing the constants block. A short comment above that block (e.g., # Thresholds — edit to adjust sensitivity) would make it discoverable.

Verdict: APPROVED

All three fixes from prior review rounds are present and correct. The remaining items (stale ssh-failures.log on reuse, awk injection from trusted-host output) are low-severity and the latter is an accepted risk for a homelab context. The script is ready to merge.

Automated review by Claude PR Reviewer

## AI Code Review (Round 3) ### Files Reviewed - `monitoring/scripts/homelab-audit.sh` (added, 273 lines) --- ### Verification of Round 1-2-3 Fixes #### Fix 1 — `--output-dir` guard (Round 1) Correct. Lines 35-38 check `$# -lt 2` immediately after matching `--output-dir`, before any shift. Calling `homelab-audit.sh --output-dir` with no argument correctly prints the error and exits 1. #### Fix 2 — `ZOMBIE_WARN` configurable (Round 2) Correct per the design intent. `ZOMBIE_WARN=1` on line 30 is a named constant, consistent with `DISK_WARN=80`, `DISK_CRIT=90`, `LOAD_WARN=2.0`, and `MEM_WARN=85` — none of which use the env-override pattern. The three variables documented in the header (`STUCK_PROC_CPU_WARN`, `REPORT_DIR`, `SSH_USER`) are the only runtime env overrides. Using a named constant is the right call here; it avoids cluttering the env-override interface with thresholds that rarely need per-run tuning. #### Fix 3 — `awk` single-quoting in `stuck_procs()` (Round 3) Correct. The `'\''` escape sequence inside the single-quoted `COLLECTOR_SCRIPT` string produces a literal single quote in the stored script text. The remote shell therefore receives: ``` awk -v t="$STUCK_PROC_CPU_WARN" '$1 ~ /^D/ && $2+0 >= t+0 {print $3}' ``` The awk program is single-quoted, so awk field references `$1`, `$2`, `$3` are never interpolated by the remote shell. The `-v t=` binding correctly passes `STUCK_PROC_CPU_WARN` from the remote variable (set from `$1` at the top of the collector script). This is the right fix. --- ### Findings #### Correctness One minor issue remains: `$SSH_FAILURES_LOG` is never truncated at startup, but `$FINDINGS_FILE` is (line 261: `>"$FINDINGS_FILE"`). If `--output-dir` points to a reused directory, stale SSH failure sentinel lines from a prior run will accumulate in `ssh-failures.log`. The summary's `ssh_failure_count` will therefore overcount on reruns. The default REPORT_DIR uses a timestamp so this only affects the `--output-dir` path, and the impact is limited to the summary counter being wrong. This is not a blocker, but worth noting. The awk programs in `cpu_load()` and `mem_pct()` use double-quoted programs with `\$N` (e.g., `"{print \$2}"`). Inside the single-quoted `COLLECTOR_SCRIPT`, these literal characters reach the remote bash verbatim. On the remote, the double-quoted awk program has `\$2` which bash reduces to a literal `$2` before passing it to awk. This is correct behavior. The `awk "BEGIN{exit !($load > $LOAD_WARN)}"` on line 175 interpolates `$load` (which comes from remote host output) directly into an awk program. A compromised host could inject awk code via a crafted `CPU_LOAD=` line. For a homelab auditing trusted hosts this is an acceptable risk, but worth noting for anyone adapting this script to untrusted networks. #### Security No secrets, hardcoded credentials, or data exposure issues. `SSH_OPTS` uses `BatchMode=yes` (prevents password prompts), `StrictHostKeyChecking=accept-new` (first-connect trust, no TOFU bypass), and `ConnectTimeout=10` (avoids hangs). SSH stderr goes to a log file rather than being silently swallowed — this was the Round 2 fix and it's implemented correctly. #### Style and Conventions Matches project bash conventions: `set -uo pipefail`, functions with `local` variables, consistent `>>"$FINDINGS_FILE"` append patterns, `((expr)) || true` for arithmetic in non-`-e` context, and thorough section comments. The `# shellcheck disable=SC2086` on line 101 is justified by the intentional word-splitting of `$SSH_OPTS`. #### Suggestions - Consider adding `>"$SSH_FAILURES_LOG"` alongside `>"$FINDINGS_FILE"` in `main()` to ensure reruns with `--output-dir` produce a clean log. One line fix. - The usage comment at the top documents `[--output-dir DIR]` but does not mention that `ZOMBIE_WARN`, `DISK_WARN`, `DISK_CRIT`, `LOAD_WARN`, and `MEM_WARN` can be changed by editing the constants block. A short comment above that block (e.g., `# Thresholds — edit to adjust sensitivity`) would make it discoverable. --- ### Verdict: APPROVED All three fixes from prior review rounds are present and correct. The remaining items (stale ssh-failures.log on reuse, awk injection from trusted-host output) are low-severity and the latter is an accepted risk for a homelab context. The script is ready to merge. --- *Automated review by Claude PR Reviewer*

Claude reviewed 2026-04-03 01:47:53 +00:00

Claude left a comment

AI Code Review

Files Reviewed

monitoring/scripts/homelab-audit.sh (added, 268 lines)

⚠️ Process Concern: Duplicate PR / Merge Conflict

This PR is currently marked not mergeable and a competing PR #33 addresses the same issue (#23) with virtually identical changes. One of these two PRs needs to be closed before the other can land. The code review below covers the code itself.

Findings

Correctness

All four claimed bug fixes are implemented correctly:

STUCK_PROC_CPU_WARN threshold propagation — COLLECTOR_SCRIPT is single-quoted (no local interpolation) and the threshold is passed as positional arg $1 via bash -s -- "$STUCK_PROC_CPU_WARN". The collector receives it as STUCK_PROC_CPU_WARN="${1:-10}". Correct.
LXC IP fallback — get_lxc_ip() tries lxc-info -iH first, then falls back to pct config | grep -oP '(?<=ip=)[^/,]+'. The lookbehind/character-class regex correctly strips the subnet mask. Correct.
SSH failure logging — ssh_cmd() redirects stderr to $SSH_FAILURES_LOG. parse_and_report() writes a SSH_FAILURE sentinel on non-zero exit and a human-readable line to $FINDINGS_FILE. generate_summary() counts SSH_FAILURE lines and lists them. Correct.
set -e omission comment — clearly documents the intentional deviation at the top of the script. Correct.

The floating-point load comparison (awk "BEGIN{exit !($load > $LOAD_WARN)}") and empty-guard patterns ([[ -n "$load" ]], [[ -n "$mem" ]]) are both correct. ((host_count++)) || true correctly avoids an exit on zero-increment without -e.

Security

No hardcoded credentials. SSH_USER defaults to root, expected for a Proxmox admin tool.
STUCK_PROC_CPU_WARN is passed as a positional arg and consumed via awk's -v t= — awk variables don't execute shell code, so no injection path.
All variables are quoted throughout. # shellcheck disable=SC2086 on $SSH_OPTS is the standard accepted pattern for intentional word-splitting of options.
No secrets or PII in output paths.

Style & Conventions

Shebang: #!/usr/bin/env bash used; project convention (development/bash-CONTEXT.md) says #!/bin/bash. Minor and portable.
set -euo pipefail: Intentional omission of -e is documented inline. Justified deviation.
echo "$COLLECTOR_SCRIPT": printf '%s\n' is more robust than echo for content starting with -n. The fixed literal here has no such tokens, so safe in practice. Low priority.
python3 dependency: collect_inventory() calls python3 -c "..." on the Proxmox host for VM IP parsing. Not listed in the script header's dependency notes.

Suggestions

Consider documenting python3 as a dependency in the script header, or replacing with grep/awk to avoid the assumption.
pct list and qm list loops use positional read fields — Proxmox names with spaces would silently break field assignment (not a real risk in practice, but worth noting).

Verdict: APPROVED (code quality)

The code is correct, secure, and all four stated bugs are properly fixed. The implementation is clean with good function decomposition and consistent bash practices. Resolve the duplicate PR #33 and the merge conflict before landing.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `monitoring/scripts/homelab-audit.sh` (added, 268 lines) --- ### ⚠️ Process Concern: Duplicate PR / Merge Conflict This PR is currently marked **not mergeable** and a competing PR #33 addresses the same issue (#23) with virtually identical changes. One of these two PRs needs to be closed before the other can land. The code review below covers the code itself. --- ### Findings #### Correctness All four claimed bug fixes are implemented correctly: 1. **STUCK_PROC_CPU_WARN threshold propagation** — `COLLECTOR_SCRIPT` is single-quoted (no local interpolation) and the threshold is passed as positional arg `$1` via `bash -s -- "$STUCK_PROC_CPU_WARN"`. The collector receives it as `STUCK_PROC_CPU_WARN="${1:-10}"`. Correct. 2. **LXC IP fallback** — `get_lxc_ip()` tries `lxc-info -iH` first, then falls back to `pct config | grep -oP '(?<=ip=)[^/,]+'`. The lookbehind/character-class regex correctly strips the subnet mask. Correct. 3. **SSH failure logging** — `ssh_cmd()` redirects stderr to `$SSH_FAILURES_LOG`. `parse_and_report()` writes a `SSH_FAILURE` sentinel on non-zero exit and a human-readable line to `$FINDINGS_FILE`. `generate_summary()` counts `SSH_FAILURE` lines and lists them. Correct. 4. **`set -e` omission comment** — clearly documents the intentional deviation at the top of the script. Correct. The floating-point load comparison (`awk "BEGIN{exit !($load > $LOAD_WARN)}"`) and empty-guard patterns (`[[ -n "$load" ]]`, `[[ -n "$mem" ]]`) are both correct. `((host_count++)) || true` correctly avoids an exit on zero-increment without `-e`. #### Security - No hardcoded credentials. `SSH_USER` defaults to `root`, expected for a Proxmox admin tool. - `STUCK_PROC_CPU_WARN` is passed as a positional arg and consumed via awk's `-v t=` — awk variables don't execute shell code, so no injection path. - All variables are quoted throughout. `# shellcheck disable=SC2086` on `$SSH_OPTS` is the standard accepted pattern for intentional word-splitting of options. - No secrets or PII in output paths. #### Style & Conventions - **Shebang**: `#!/usr/bin/env bash` used; project convention (`development/bash-CONTEXT.md`) says `#!/bin/bash`. Minor and portable. - **`set -euo pipefail`**: Intentional omission of `-e` is documented inline. Justified deviation. - **`echo "$COLLECTOR_SCRIPT"`**: `printf '%s\n'` is more robust than `echo` for content starting with `-n`. The fixed literal here has no such tokens, so safe in practice. Low priority. - **python3 dependency**: `collect_inventory()` calls `python3 -c "..."` on the Proxmox host for VM IP parsing. Not listed in the script header's dependency notes. #### Suggestions - Consider documenting `python3` as a dependency in the script header, or replacing with `grep`/`awk` to avoid the assumption. - `pct list` and `qm list` loops use positional `read` fields — Proxmox names with spaces would silently break field assignment (not a real risk in practice, but worth noting). --- ### Verdict: APPROVED (code quality) The code is correct, secure, and all four stated bugs are properly fixed. The implementation is clean with good function decomposition and consistent bash practices. **Resolve the duplicate PR #33 and the merge conflict before landing.** --- *Automated review by Claude PR Reviewer*

Claude added

ai-reviewed

and removed

ai-reviewing

labels 2026-04-03 01:47:58 +00:00

cal force-pushed issue/23-homelab-audit-sh-fix-variable-interpolation-and-co from e15f88975f to 1ed911e61b

2026-04-03 01:49:05 +00:00

Compare

cal merged commit 1ed911e61b into main

2026-04-03 01:49:12 +00:00

cal deleted branch issue/23-homelab-audit-sh-fix-variable-interpolation-and-co

2026-04-03 01:49:12 +00:00