claude-home/monitoring/scripts/setup-discord-monitoring.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

4.8 KiB

Tdarr Discord Monitoring Setup Guide

Overview

This guide sets up automated Discord notifications for Tdarr worker timeouts, stalls, and completions using a custom log monitoring script.

Prerequisites

  • Discord server where you want notifications
  • Administrative access to create webhooks
  • Tdarr server accessible via SSH
  • Podman/Docker access to Tdarr node

Setup Steps

1. Create Discord Webhook

  1. Go to your Discord server → Server SettingsIntegrationsWebhooks
  2. Click Create Webhook
  3. Name it "Tdarr Monitor" and select the channel for notifications
  4. Copy the Webhook URL (keep this secure!)

2. Configure Monitoring Script

Edit the script to add your webhook:

nano /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

Update these lines:

DISCORD_WEBHOOK="https://discord.com/api/webhooks/YOUR_WEBHOOK_ID/YOUR_WEBHOOK_TOKEN"
SERVER_HOST="tdarr"  # Your SSH alias for Tdarr server
NODE_CONTAINER="tdarr-node-gpu-unmapped"  # Your node container name

3. Make Script Executable

chmod +x /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

4. Test the Script

/mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

You should see a "monitoring started" message in your Discord channel.

5. Setup Automated Monitoring (Choose One)

Option A: Cron Job (Simple)

# Edit crontab
crontab -e

# Add this line to check every 5 minutes
*/5 * * * * /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh >/dev/null 2>&1

Option B: Systemd Service (Advanced)

Create a systemd service for more reliable monitoring:

sudo nano /etc/systemd/system/tdarr-monitor.service

Content:

[Unit]
Description=Tdarr Timeout Monitor
After=network.target

[Service]
Type=oneshot
User=cal
ExecStart=/mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

Create timer:

sudo nano /etc/systemd/system/tdarr-monitor.timer

Content:

[Unit]
Description=Run Tdarr Monitor every 5 minutes
Requires=tdarr-monitor.service

[Timer]
OnCalendar=*:0/5
Persistent=true

[Install]
WantedBy=timers.target

Enable and start:

sudo systemctl daemon-reload
sudo systemctl enable tdarr-monitor.timer
sudo systemctl start tdarr-monitor.timer

Notification Examples

Worker Timeout Alert

🎬 Tdarr Monitoring Alert
⚠️ 4 file(s) timed out in staging:

TV/Survivor/Season 48/Survivor (2000) - S48E04... TV/Survivor/Season 48/Survivor (2000) - S48E11...
TV/Survivor/Season 26/Survivor (2000) - S26E05...

Files were removed from staging and will retry.

Worker Stall Alert

🎬 Tdarr Monitoring Alert
🔴 2 worker stall(s) detected:

Worker eager-eyas Worker oblong-owl

Workers were cancelled and will restart.

Success Notification (Optional)

🎬 Tdarr Monitoring Alert  
✅ 3 transcode(s) completed successfully in the last check period.

Monitoring Features

Server Limbo Timeouts - Files stuck in staging > timeout period
Node Worker Stalls - Workers that hang during transcoding
Success Notifications - Optional completion alerts
Smart Timing - Only checks every 60+ seconds to avoid spam
Rich Discord Embeds - Color-coded messages with timestamps

Customization Options

Disable Success Messages

Edit the script and comment out this line:

# check_completions  # Comment out to disable success notifications

Change Check Frequency

For cron job, modify the timing:

*/10 * * * *  # Check every 10 minutes instead of 5

For systemd timer, update OnCalendar:

OnCalendar=*:0/10  # Check every 10 minutes

Add More Monitoring

You can extend the script to monitor:

  • Disk space on cache directory
  • Network connectivity to TrueNAS
  • GPU utilization during transcoding
  • Queue depth and processing rates

Troubleshooting

No Notifications Received

  1. Check webhook URL is correct and accessible
  2. Test webhook manually:
    curl -H "Content-Type: application/json" -X POST -d '{"content":"Test message"}' "YOUR_WEBHOOK_URL"
    
  3. Check script logs: /tmp/tdarr-monitor/

False Positives

  • Adjust the timing logic in the script
  • Filter out specific log patterns that aren't actual errors
  • Tune the timeout thresholds

Missing SSH Access

  • Ensure SSH key authentication is set up for the tdarr server
  • Test: ssh tdarr "echo 'SSH working'"

Security Notes

  • Keep your Discord webhook URL private
  • Consider using environment variables for sensitive data
  • Restrict file permissions on the script (chmod 750)

This monitoring solution provides real-time alerts for Tdarr issues without requiring external monitoring infrastructure.