claude-home

Author	SHA1	Message	Date
Cal Corum	1a3785f01a	feat: dynamic summary, --hosts filter, and --json output (#24 ) All checks were successful Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 2s Details Closes #24 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 20:08:07 +00:00
Cal Corum	193ae68f96	docs: document per-core load threshold policy for server health monitoring (#22 ) All checks were successful Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 5s Details Closes #22 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-03 13:35:23 -05:00
Cal Corum	ae5da035f6	feat: add backup recency, cert expiry, OOM, and I/O wait checks (#25 ) All checks were successful Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 2s Details Closes #25 - check_backup_recency(): queries pvesh vzdump task history; flags VMs with no backup (CRIT) or no backup in 7 days (WARN) - check_cert_expiry(): probes ports 443/8443 per host via openssl; flags certs expiring ≤14 days (WARN) or ≤7 days (CRIT) - io_wait_pct() in COLLECTOR_SCRIPT: uses vmstat 1 2 to sample I/O wait; flagged as WARN when > 20% - OOM kill history was already collected via journalctl; no changes needed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-02 21:06:44 -05:00
Cal Corum	e58c5b8cc1	fix: address PR review — move memory limits to deploy block, handle swap-less hosts All checks were successful Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 2s Details Move mem_limit/memswap_limit to deploy.resources.limits.memory so the constraint is actually enforced under Compose v3. Add END clause to swap_mb() so hosts without a Swap line report 0 instead of empty output. Fix test script header comment accuracy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:05:12 -05:00
Cal Corum	f28dfeb4bf	feat: add zombie parent, swap, and OOM metrics to audit; harden Tdarr containers All checks were successful Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 3s Details Extend homelab-audit.sh collector with zombie_parents(), swap_mb(), and oom_events() functions so the audit identifies which process spawns zombies, flags high swap usage, and reports recent OOM kills. Add init: true to both Tdarr docker-compose services so tini reaps orphaned ffmpeg children, and cap tdarr-node at 28g RAM / 30g total to prevent unbounded memory use. Closes #30 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 21:02:05 -05:00
Cal Corum	1ed911e61b	fix: single-quote awk program in stuck_procs() collector All checks were successful Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 3s Details Reindex Knowledge Base / reindex (push) Successful in 3s Details The awk program was double-quoted inside the single-quoted COLLECTOR_SCRIPT, causing $1/$2/$3 to be expanded by the remote shell as empty positional parameters instead of awk field references. This made the D-state process filter silently match nothing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 20:48:56 -05:00
Cal Corum	7c801f6c3b	fix: guard --output-dir arg and use configurable ZOMBIE_WARN threshold - Validate --output-dir has a following argument before accessing $2 (prevents unbound variable crash under set -u) - Add ZOMBIE_WARN config variable (default: 1) and use it in the zombie check instead of hardcoding 0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 20:48:56 -05:00
Cal Corum	9a39abd64c	fix: add homelab-audit.sh with variable interpolation and collector fixes (#23 ) Closes #23 - Fix STUCK_PROC_CPU_WARN not reaching remote collector: COLLECTOR_SCRIPT heredoc stays single-quoted; threshold is passed as $1 to the remote bash session so it is evaluated correctly on the collecting host - Fix LXC IP discovery for static-IP containers: lxc-info result now falls back to parsing pct config when lxc-info returns empty - Fix SSH failures silently dropped: stderr redirected to $REPORT_DIR/ssh-failures.log; SSH_FAILURE entries counted and printed in the summary - Add explicit comment explaining why -e is omitted from set options Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-02 20:48:56 -05:00
Cal Corum	fcecde0de4	docs: decommission cognitive memory references from KB All checks were successful Reindex Knowledge Base / reindex (push) Successful in 2s Details Removed cognitive-memory MCP, timers, and symlink system references. Replaced with kb-search MCP and /save-doc skill workflow. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 23:02:56 -05:00
Cal Corum	4b7eca8a46	docs: add YAML frontmatter to all 151 markdown files All checks were successful Reindex Knowledge Base / reindex (push) Successful in 3s Details Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 09:00:44 -05:00
Cal Corum	28abde7c9f	chore: add recovered CT 302 configs, archive tdarr scripts, clean up repo - Add recovered LXC 300/302 server-diagnostics configs as reference (headless Claude permission patterns, health check client) - Archive decommissioned tdarr monitoring scripts - Gitignore rpg-art/ directory - Delete stray temp files and swarm-test/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 00:41:41 -06:00
Cal Corum	5ff94a9d20	docs: remove decommissioned MCP Gateway (CT 303) from monitoring inventory Migrated MCP servers back to local stdio config, shut down LXC 303. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-25 12:39:53 -06:00
Cal Corum	df553e5142	docs: add AI infrastructure LXCs (301-303) to monitoring server inventory Groups Claude Discord Coordinator, Claude Runner, and MCP Gateway under a shared section. Documents new CT 303 MCP Gateway with n8n and Gitea MCP server configuration details. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:58:19 -06:00
Cal Corum	28851a9012	docs: add pihole1, sba-bots, foundry to monitoring server inventory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 14:15:43 -06:00
Cal Corum	3737c7dda5	docs: expand monitoring coverage, update Proxmox upgrade plan, remove decommissioned tdarr scripts - Update monitoring CONTEXT.md with 6-server inventory table, per-server SSH user support, and pre-escalation Discord notification docs - Remove tdarr local monitoring scripts (decommissioned per prior decision) - Update Proxmox upgrade plan with Phase 1 completion and Phase 2 prep - Update vm-management CONTEXT.md with current PVE 8 state - CLAUDE.md: auto-run /save-memories at 25% context instead of asking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 11:08:48 -06:00
Cal Corum	f20e221090	docs: update monitoring CONTEXT.md with expanded server inventory Add server table with all 6 monitored hosts, per-server SSH user docs, updated workflow server list, and pre-escalation Discord notification documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 11:05:40 -06:00
Cal Corum	ed16fee9f7	docs: add CT 302 SSH alias and git auth details to server-diagnostics Documents the claude-runner SSH alias, HTTPS token auth method, and notes that SSH git remotes don't work from CT 302. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 09:04:33 -06:00
Cal Corum	3b2e031f45	Update monitoring docs with Uptime Kuma monitors and Discord alerts Document all 20 active monitors with targets and tags, Discord notification configuration, and API access details for programmatic management via uptime-kuma-api. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 23:02:49 -06:00
Cal Corum	d0dbe86fba	Add NVIDIA update checker and monitoring scripts documentation Add nvidia_update_checker.py for weekly driver update monitoring with Discord alerts. Add scripts CONTEXT.md and update README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 22:21:00 -06:00
Cal Corum	a35891b565	Add Uptime Kuma service monitoring on LXC 227 Deploy Uptime Kuma for centralized service uptime monitoring at https://status.manticorum.com. Proxmox LXC 227 (10.10.0.227) running Ubuntu 22.04 with Docker. Updated monitoring documentation, CLAUDE.md context loading rules, and server-configs host inventory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 22:18:51 -06:00
Cal Corum	3112b3d6fe	CLAUDE: Add Jellyfin GPU health monitor with auto-restart - Created jellyfin_gpu_monitor.py for detecting lost GPU access - Sends Discord alerts when GPU access fails - Auto-restarts container to restore GPU binding - Runs every 5 minutes via cron on ubuntu-manticore - Documents FFmpeg exit code 187 (NVENC failure) in troubleshooting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-28 22:57:04 -06:00
Cal Corum	0ecac96703	CLAUDE: Add Tdarr file monitoring scripts - Add tdarr_file_monitor.py for API-based monitoring - Add cron wrapper script for scheduled execution 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-07 00:48:10 -06:00
Cal Corum	edc78c2dd6	CLAUDE: Add comprehensive gaming-aware Tdarr management system - Created complete gaming detection and priority system - Added gaming schedule configuration and enforcement - Implemented Steam library monitoring with auto-detection - Built comprehensive game process detection for multiple platforms - Added gaming-aware Tdarr worker management with priority controls - Created emergency gaming mode for immediate worker shutdown - Integrated Discord notifications for gaming state changes - Replaced old bash monitoring with enhanced Python monitoring system - Added persistent state management and memory tracking - Implemented configurable gaming time windows and schedules - Updated .gitignore to exclude logs directories 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-14 15:17:52 -05:00
Cal Corum	10c9e0d854	CLAUDE: Migrate to technology-first documentation architecture Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-08-12 23:20:15 -05:00

24 Commits