claude-home/scripts/monitoring/setup-discord-monitoring.md
Cal Corum 715354da7d CLAUDE: Add comprehensive documentation for Tdarr monitoring and NAS configuration
Complete documentation package for home lab infrastructure:

## New Documentation Files:
- **Tdarr Monitoring Configuration**: Complete setup guide for Discord-based Tdarr monitoring system
- **NAS Mount Configuration**: SMB/CIFS mount setup and troubleshooting for media storage
- **Discord Monitoring Setup**: Step-by-step guide for webhook configuration and notification testing

## Documentation Features:
- **Reference Architecture**: Best practices for distributed Tdarr deployments
- **Configuration Templates**: Copy-paste ready configurations with security considerations
- **Troubleshooting Guides**: Common issues and solutions for production environments
- **Integration Examples**: Real-world implementation patterns for home lab environments

## Coverage Areas:
- Docker container orchestration and monitoring
- Network storage integration and performance optimization
- Automated alerting and notification systems
- Production-ready configuration management

These documents support the enhanced monitoring system and provide comprehensive guidance for maintaining a robust home lab infrastructure.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 10:39:55 -05:00

4.8 KiB

Tdarr Discord Monitoring Setup Guide

Overview

This guide sets up automated Discord notifications for Tdarr worker timeouts, stalls, and completions using a custom log monitoring script.

Prerequisites

  • Discord server where you want notifications
  • Administrative access to create webhooks
  • Tdarr server accessible via SSH
  • Podman/Docker access to Tdarr node

Setup Steps

1. Create Discord Webhook

  1. Go to your Discord server → Server SettingsIntegrationsWebhooks
  2. Click Create Webhook
  3. Name it "Tdarr Monitor" and select the channel for notifications
  4. Copy the Webhook URL (keep this secure!)

2. Configure Monitoring Script

Edit the script to add your webhook:

nano /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

Update these lines:

DISCORD_WEBHOOK="https://discord.com/api/webhooks/YOUR_WEBHOOK_ID/YOUR_WEBHOOK_TOKEN"
SERVER_HOST="tdarr"  # Your SSH alias for Tdarr server
NODE_CONTAINER="tdarr-node-gpu-unmapped"  # Your node container name

3. Make Script Executable

chmod +x /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

4. Test the Script

/mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

You should see a "monitoring started" message in your Discord channel.

5. Setup Automated Monitoring (Choose One)

Option A: Cron Job (Simple)

# Edit crontab
crontab -e

# Add this line to check every 5 minutes
*/5 * * * * /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh >/dev/null 2>&1

Option B: Systemd Service (Advanced)

Create a systemd service for more reliable monitoring:

sudo nano /etc/systemd/system/tdarr-monitor.service

Content:

[Unit]
Description=Tdarr Timeout Monitor
After=network.target

[Service]
Type=oneshot
User=cal
ExecStart=/mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

Create timer:

sudo nano /etc/systemd/system/tdarr-monitor.timer

Content:

[Unit]
Description=Run Tdarr Monitor every 5 minutes
Requires=tdarr-monitor.service

[Timer]
OnCalendar=*:0/5
Persistent=true

[Install]
WantedBy=timers.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable tdarr-monitor.timer
sudo systemctl start tdarr-monitor.timer

Notification Examples

Worker Timeout Alert

🎬 Tdarr Monitoring Alert
⚠️ 4 file(s) timed out in staging:

TV/Survivor/Season 48/Survivor (2000) - S48E04... TV/Survivor/Season 48/Survivor (2000) - S48E11...
TV/Survivor/Season 26/Survivor (2000) - S26E05...

Files were removed from staging and will retry.

Worker Stall Alert

🎬 Tdarr Monitoring Alert
🔴 2 worker stall(s) detected:

Worker eager-eyas Worker oblong-owl

Workers were cancelled and will restart.

Success Notification (Optional)

🎬 Tdarr Monitoring Alert  
✅ 3 transcode(s) completed successfully in the last check period.

Monitoring Features

Server Limbo Timeouts - Files stuck in staging > timeout period
Node Worker Stalls - Workers that hang during transcoding
Success Notifications - Optional completion alerts
Smart Timing - Only checks every 60+ seconds to avoid spam
Rich Discord Embeds - Color-coded messages with timestamps

Customization Options

Disable Success Messages

Edit the script and comment out this line:

# check_completions  # Comment out to disable success notifications

Change Check Frequency

For cron job, modify the timing:

*/10 * * * *  # Check every 10 minutes instead of 5

For systemd timer, update OnCalendar:

OnCalendar=*:0/10  # Check every 10 minutes

Add More Monitoring

You can extend the script to monitor:

  • Disk space on cache directory
  • Network connectivity to TrueNAS
  • GPU utilization during transcoding
  • Queue depth and processing rates

Troubleshooting

No Notifications Received

  1. Check webhook URL is correct and accessible
  2. Test webhook manually:
    curl -H "Content-Type: application/json" -X POST -d '{"content":"Test message"}' "YOUR_WEBHOOK_URL"
    
  3. Check script logs: /tmp/tdarr-monitor/

False Positives

  • Adjust the timing logic in the script
  • Filter out specific log patterns that aren't actual errors
  • Tune the timeout thresholds

Missing SSH Access

  • Ensure SSH key authentication is set up for the tdarr server
  • Test: ssh tdarr "echo 'SSH working'"

Security Notes

  • Keep your Discord webhook URL private
  • Consider using environment variables for sensitive data
  • Restrict file permissions on the script (chmod 750)

This monitoring solution provides real-time alerts for Tdarr issues without requiring external monitoring infrastructure.