Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
172 lines
5.8 KiB
Markdown
172 lines
5.8 KiB
Markdown
# Scripts Directory
|
|
|
|
This directory contains operational scripts and utilities for home lab management and automation.
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
scripts/
|
|
├── README.md # This documentation
|
|
├── tdarr_monitor.py # Enhanced Tdarr monitoring with Discord alerts
|
|
├── tdarr/ # Tdarr automation and scheduling
|
|
├── monitoring/ # System monitoring and alerting
|
|
└── <future>/ # Other organized automation subsystems
|
|
```
|
|
|
|
## Scripts Overview
|
|
|
|
### `tdarr_monitor.py` - Enhanced Tdarr Monitoring
|
|
|
|
**Description**: Comprehensive Tdarr monitoring script with stuck job detection and Discord notifications.
|
|
|
|
**Features**:
|
|
- 📊 Complete Tdarr system monitoring (server, nodes, queue, libraries)
|
|
- 🧠 Short-term memory for stuck job detection
|
|
- 🚨 Discord notifications with rich embeds
|
|
- 💾 Persistent state management
|
|
- ⚙️ Configurable thresholds and alerts
|
|
|
|
**Quick Start**:
|
|
```bash
|
|
# Basic monitoring
|
|
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 --check all
|
|
|
|
# Enable stuck job detection with 15-minute threshold
|
|
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 \
|
|
--check nodes --detect-stuck --stuck-threshold 15
|
|
|
|
# Full monitoring with Discord alerts (uses default webhook)
|
|
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 \
|
|
--check all --detect-stuck --discord-alerts
|
|
|
|
# Test Discord integration (uses default webhook)
|
|
python3 scripts/tdarr_monitor.py --server http://10.10.0.43:8265 --discord-test
|
|
```
|
|
|
|
**CLI Options**:
|
|
```
|
|
--server Tdarr server URL (required)
|
|
--check Type of check: all, status, queue, nodes, libraries, stats, health
|
|
--timeout Request timeout in seconds (default: 30)
|
|
--output Output format: json, pretty (default: pretty)
|
|
--verbose Enable verbose logging
|
|
--detect-stuck Enable stuck job detection
|
|
--stuck-threshold Minutes before job considered stuck (default: 30)
|
|
--memory-file Path to memory state file (default: .claude/tmp/tdarr_memory.pkl)
|
|
--clear-memory Clear memory state and exit
|
|
--discord-webhook Discord webhook URL for notifications (default: configured)
|
|
--discord-alerts Enable Discord alerts for stuck jobs
|
|
--discord-test Send test Discord message and exit
|
|
```
|
|
|
|
**Memory Management**:
|
|
- **Persistent State**: Worker snapshots saved to `.claude/tmp/tdarr_memory.pkl`
|
|
- **Automatic Cleanup**: Removes tracking for disappeared workers
|
|
- **Error Recovery**: Graceful handling of corrupted memory files
|
|
|
|
**Discord Features**:
|
|
- **Two Message Types**: Simple content messages and rich embeds
|
|
- **Stuck Job Alerts**: Detailed embed notifications with file info, progress, duration
|
|
- **System Status**: Health summaries with node details and color-coded status
|
|
- **Customizable**: Colors, fields, titles, descriptions fully configurable
|
|
- **Error Handling**: Graceful failures without breaking monitoring
|
|
|
|
**Integration Examples**:
|
|
|
|
*Cron Job for Regular Monitoring*:
|
|
```bash
|
|
# Check every 15 minutes, alert on stuck jobs over 30 minutes
|
|
*/15 * * * * cd /path/to/claude-home && python3 scripts/tdarr_monitor.py \
|
|
--server http://10.10.0.43:8265 --check nodes --detect-stuck --discord-alerts
|
|
```
|
|
|
|
*Systemd Service*:
|
|
```ini
|
|
[Unit]
|
|
Description=Tdarr Monitor
|
|
After=network.target
|
|
|
|
[Service]
|
|
Type=oneshot
|
|
ExecStart=/usr/bin/python3 /path/to/claude-home/scripts/tdarr_monitor.py \
|
|
--server http://10.10.0.43:8265 --check all --detect-stuck --discord-alerts
|
|
WorkingDirectory=/path/to/claude-home
|
|
User=your-user
|
|
|
|
[Timer]
|
|
OnCalendar=*:0/15
|
|
Persistent=true
|
|
|
|
[Install]
|
|
WantedBy=timers.target
|
|
```
|
|
|
|
**API Data Classes**:
|
|
The script uses strongly-typed dataclasses for all API responses:
|
|
- `ServerStatus` - Server health and version info
|
|
- `NodeStatus` - Node details with stuck job tracking
|
|
- `QueueStatus` - Transcoding queue statistics
|
|
- `LibraryStatus` - Library scan progress
|
|
- `StatisticsStatus` - Overall system statistics
|
|
- `HealthStatus` - Comprehensive health check results
|
|
|
|
**Error Handling**:
|
|
- Network timeouts and connection errors
|
|
- API endpoint failures
|
|
- JSON parsing errors
|
|
- Discord webhook failures
|
|
- Memory state corruption
|
|
- Missing dependencies
|
|
|
|
**Dependencies**:
|
|
- `requests` - HTTP client for API calls
|
|
- `pickle` - State serialization
|
|
- Standard library only (no external requirements beyond requests)
|
|
|
|
---
|
|
|
|
## Development Guidelines
|
|
|
|
### Adding New Scripts
|
|
|
|
1. **Location**: Place scripts in appropriate subdirectories by function
|
|
2. **Documentation**: Include comprehensive docstrings and usage examples
|
|
3. **Error Handling**: Implement robust error handling and logging
|
|
4. **Configuration**: Use CLI arguments and/or config files for flexibility
|
|
5. **Testing**: Include test functionality where applicable
|
|
|
|
### Naming Conventions
|
|
|
|
- Use descriptive names: `tdarr_monitor.py` not `monitor.py`
|
|
- Use underscores for Python scripts: `system_health.py`
|
|
- Use hyphens for shell scripts: `backup-system.sh`
|
|
|
|
### Directory Organization
|
|
|
|
Create subdirectories for related functionality:
|
|
```
|
|
scripts/
|
|
├── monitoring/ # System monitoring scripts
|
|
├── backup/ # Backup and restore utilities
|
|
├── network/ # Network management tools
|
|
├── containers/ # Docker/Podman management
|
|
└── maintenance/ # System maintenance tasks
|
|
```
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
### Planned Features
|
|
- **Email Notifications**: SMTP integration for email alerts
|
|
- **Prometheus Metrics**: Export metrics for Grafana dashboards
|
|
- **Webhook Actions**: Trigger external actions on stuck jobs
|
|
- **Multi-Server Support**: Monitor multiple Tdarr instances
|
|
- **Configuration Files**: YAML/JSON config file support
|
|
|
|
### Contributing
|
|
1. Follow existing code style and patterns
|
|
2. Add comprehensive documentation
|
|
3. Include error handling and logging
|
|
4. Test thoroughly before committing
|
|
5. Update this README with new scripts |