Extend homelab-audit.sh collector with zombie_parents(), swap_mb(), and
oom_events() functions so the audit identifies which process spawns zombies,
flags high swap usage, and reports recent OOM kills. Add init: true to both
Tdarr docker-compose services so tini reaps orphaned ffmpeg children, and
cap tdarr-node at 28g RAM / 30g total to prevent unbounded memory use.
Closes#30
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The awk program was double-quoted inside the single-quoted
COLLECTOR_SCRIPT, causing $1/$2/$3 to be expanded by the remote
shell as empty positional parameters instead of awk field references.
This made the D-state process filter silently match nothing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Validate --output-dir has a following argument before accessing $2
(prevents unbound variable crash under set -u)
- Add ZOMBIE_WARN config variable (default: 1) and use it in the zombie
check instead of hardcoding 0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#23
- Fix STUCK_PROC_CPU_WARN not reaching remote collector: COLLECTOR_SCRIPT
heredoc stays single-quoted; threshold is passed as $1 to the remote
bash session so it is evaluated correctly on the collecting host
- Fix LXC IP discovery for static-IP containers: lxc-info result now falls
back to parsing pct config when lxc-info returns empty
- Fix SSH failures silently dropped: stderr redirected to
$REPORT_DIR/ssh-failures.log; SSH_FAILURE entries counted and printed
in the summary
- Add explicit comment explaining why -e is omitted from set options
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.
Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update monitoring CONTEXT.md with 6-server inventory table, per-server
SSH user support, and pre-escalation Discord notification docs
- Remove tdarr local monitoring scripts (decommissioned per prior decision)
- Update Proxmox upgrade plan with Phase 1 completion and Phase 2 prep
- Update vm-management CONTEXT.md with current PVE 8 state
- CLAUDE.md: auto-run /save-memories at 25% context instead of asking
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add nvidia_update_checker.py for weekly driver update monitoring with
Discord alerts. Add scripts CONTEXT.md and update README.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Created jellyfin_gpu_monitor.py for detecting lost GPU access
- Sends Discord alerts when GPU access fails
- Auto-restarts container to restore GPU binding
- Runs every 5 minutes via cron on ubuntu-manticore
- Documents FFmpeg exit code 187 (NVENC failure) in troubleshooting
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add tdarr_file_monitor.py for API-based monitoring
- Add cron wrapper script for scheduled execution
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Created complete gaming detection and priority system
- Added gaming schedule configuration and enforcement
- Implemented Steam library monitoring with auto-detection
- Built comprehensive game process detection for multiple platforms
- Added gaming-aware Tdarr worker management with priority controls
- Created emergency gaming mode for immediate worker shutdown
- Integrated Discord notifications for gaming state changes
- Replaced old bash monitoring with enhanced Python monitoring system
- Added persistent state management and memory tracking
- Implemented configurable gaming time windows and schedules
- Updated .gitignore to exclude logs directories
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>