Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture

Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-12 23:20:15 -05:00

5.8 KiB

Raw Permalink Blame History

Scripts Directory

This directory contains operational scripts and utilities for home lab management and automation.

Directory Structure

scripts/
├── README.md                    # This documentation
├── tdarr_monitor.py            # Enhanced Tdarr monitoring with Discord alerts
├── tdarr/                      # Tdarr automation and scheduling
├── monitoring/                 # System monitoring and alerting
└── <future>/                   # Other organized automation subsystems

Scripts Overview

`tdarr_monitor.py` - Enhanced Tdarr Monitoring

Description: Comprehensive Tdarr monitoring script with stuck job detection and Discord notifications.

Features:

📊 Complete Tdarr system monitoring (server, nodes, queue, libraries)
🧠 Short-term memory for stuck job detection
🚨 Discord notifications with rich embeds
💾 Persistent state management
⚙️ Configurable thresholds and alerts

Quick Start:

# Basic monitoring
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 --check all

# Enable stuck job detection with 15-minute threshold
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 \
    --check nodes --detect-stuck --stuck-threshold 15

# Full monitoring with Discord alerts (uses default webhook)
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 \
    --check all --detect-stuck --discord-alerts

# Test Discord integration (uses default webhook)
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 --discord-test

CLI Options:

--server              Tdarr server URL (required)
--check               Type of check: all, status, queue, nodes, libraries, stats, health
--timeout             Request timeout in seconds (default: 30)
--output              Output format: json, pretty (default: pretty)
--verbose             Enable verbose logging
--detect-stuck        Enable stuck job detection
--stuck-threshold     Minutes before job considered stuck (default: 30)
--memory-file         Path to memory state file (default: .claude/tmp/tdarr_memory.pkl)
--clear-memory        Clear memory state and exit
--discord-webhook     Discord webhook URL for notifications (default: configured)
--discord-alerts      Enable Discord alerts for stuck jobs
--discord-test        Send test Discord message and exit

Memory Management:

Persistent State: Worker snapshots saved to .claude/tmp/tdarr_memory.pkl
Automatic Cleanup: Removes tracking for disappeared workers
Error Recovery: Graceful handling of corrupted memory files

Discord Features:

Two Message Types: Simple content messages and rich embeds
Stuck Job Alerts: Detailed embed notifications with file info, progress, duration
System Status: Health summaries with node details and color-coded status
Customizable: Colors, fields, titles, descriptions fully configurable
Error Handling: Graceful failures without breaking monitoring

Integration Examples:

Cron Job for Regular Monitoring:

# Check every 15 minutes, alert on stuck jobs over 30 minutes
*/15 * * * * cd /path/to/claude-home && python3 scripts/tdarr_monitor.py \
    --server http://10.10.0.43:8265 --check nodes --detect-stuck --discord-alerts

Systemd Service:

[Unit]
Description=Tdarr Monitor
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /path/to/claude-home/scripts/tdarr_monitor.py \
    --server http://10.10.0.43:8265 --check all --detect-stuck --discord-alerts
WorkingDirectory=/path/to/claude-home
User=your-user

[Timer]
OnCalendar=*:0/15
Persistent=true

[Install]
WantedBy=timers.target

API Data Classes: The script uses strongly-typed dataclasses for all API responses:

ServerStatus - Server health and version info
NodeStatus - Node details with stuck job tracking
QueueStatus - Transcoding queue statistics
LibraryStatus - Library scan progress
StatisticsStatus - Overall system statistics
HealthStatus - Comprehensive health check results

Error Handling:

Network timeouts and connection errors
API endpoint failures
JSON parsing errors
Discord webhook failures
Memory state corruption
Missing dependencies

Dependencies:

requests - HTTP client for API calls
pickle - State serialization
Standard library only (no external requirements beyond requests)

Development Guidelines

Adding New Scripts

Location: Place scripts in appropriate subdirectories by function
Documentation: Include comprehensive docstrings and usage examples
Error Handling: Implement robust error handling and logging
Configuration: Use CLI arguments and/or config files for flexibility
Testing: Include test functionality where applicable

Naming Conventions

Use descriptive names: tdarr_monitor.py not monitor.py
Use underscores for Python scripts: system_health.py
Use hyphens for shell scripts: backup-system.sh

Directory Organization

Create subdirectories for related functionality:

scripts/
├── monitoring/          # System monitoring scripts
├── backup/             # Backup and restore utilities  
├── network/            # Network management tools
├── containers/         # Docker/Podman management
└── maintenance/        # System maintenance tasks

Future Enhancements

Planned Features

Email Notifications: SMTP integration for email alerts
Prometheus Metrics: Export metrics for Grafana dashboards
Webhook Actions: Trigger external actions on stuck jobs
Multi-Server Support: Monitor multiple Tdarr instances
Configuration Files: YAML/JSON config file support

Contributing

Follow existing code style and patterns
Add comprehensive documentation
Include error handling and logging
Test thoroughly before committing
Update this README with new scripts

5.8 KiB Raw Permalink Blame History