claude-home/tdarr/troubleshooting.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

7.2 KiB

Tdarr Troubleshooting Guide

forEach Error Resolution

Problem: TypeError: Cannot read properties of undefined (reading 'forEach')

Symptoms: Scanning phase fails at "Tagging video res" step, preventing all transcodes Root Cause: Custom plugin mounts override community plugins with incompatible versions

Solution: Clean Plugin Installation

  1. Remove custom plugin mounts from docker-compose.yml
  2. Force plugin regeneration:
    ssh tdarr "docker restart tdarr"
    podman restart tdarr-node-gpu
    
  3. Verify clean plugins: Check for null-safety fixes (streams || []).forEach()

Plugin Safety Patterns

// ❌ Unsafe - causes forEach errors
args.variables.ffmpegCommand.streams.forEach()

// ✅ Safe - null-safe forEach
(args.variables.ffmpegCommand.streams || []).forEach()

Staging Section Timeout Issues

Problem: Files removed from staging after 300 seconds

Symptoms:

  • .tmp files stuck in work directories
  • ENOTEMPTY errors during cleanup
  • Subsequent jobs blocked

Solution: Automated Monitoring System

Monitor Script: /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

Automatic Actions:

  • Detects staging timeouts every 20 minutes
  • Removes stuck work directories
  • Sends Discord notifications
  • Logs all cleanup activities

Manual Cleanup Commands

# Check staging section
ssh tdarr "docker logs tdarr | tail -50"

# Find stuck work directories
find /mnt/NV2/tdarr-cache -name "tdarr-workDir*" -type d

# Force cleanup stuck directory
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir-[ID]

System Stability Issues

Problem: Kernel crashes during intensive transcoding

Root Cause: CIFS network issues during large file streaming (mapped nodes)

Solution: Convert to Unmapped Node Architecture

  1. Enable unmapped nodes in server Options
  2. Update node configuration:
    # Add to container environment
    -e nodeType=unmapped
    -e unmappedNodeCache=/cache
    
    # Use local cache volume
    -v "/mnt/NV2/tdarr-cache:/cache"
    
    # Remove media volume (no longer needed)
    
  3. Benefits: Eliminates CIFS streaming, prevents kernel crashes

Container Resource Limits

# Prevent memory exhaustion
deploy:
  resources:
    limits:
      memory: 8G
      cpus: '6'

Gaming Detection Issues

Problem: Tdarr doesn't stop during gaming

Check gaming detection:

# Test current gaming detection
./tdarr-schedule-manager.sh test

# View scheduler logs
tail -f /tmp/tdarr-scheduler.log

# Verify GPU usage detection
nvidia-smi

Gaming Process Detection

Monitored Processes:

  • Steam, Lutris, Heroic Games Launcher
  • Wine, Bottles (Windows compatibility)
  • GameMode, MangoHUD (utilities)
  • GPU usage >15% (configurable threshold)

Configuration Adjustments

# Edit gaming detection threshold
./tdarr-schedule-manager.sh edit

# Apply preset configurations
./tdarr-schedule-manager.sh preset gaming-only  # No time limits
./tdarr-schedule-manager.sh preset night-only   # 10PM-7AM only

Network and Access Issues

Server Connection Problems

Server Access Commands:

# SSH to Tdarr server
ssh tdarr

# Check server status
ssh tdarr "docker ps | grep tdarr"

# View server logs
ssh tdarr "docker logs tdarr"

# Access server container
ssh tdarr "docker exec -it tdarr /bin/bash"

Node Registration Issues

# Check node logs
podman logs tdarr-node-gpu

# Verify node registration
# Look for "Node registered" in server logs
ssh tdarr "docker logs tdarr | grep -i node"

# Test node connectivity
curl http://10.10.0.43:8265/api/v2/status

Performance Issues

Slow Transcoding Performance

Diagnosis:

  1. Check cache location: Should be local NVMe, not network
  2. Verify unmapped mode: nodeType=unmapped in container
  3. Monitor I/O: iotop during transcoding

Expected Performance:

  • Mapped nodes: Constant SMB streaming (~100MB/s)
  • Unmapped nodes: Download once → Process locally → Upload once

GPU Utilization Problems

# Monitor GPU usage during transcoding
watch nvidia-smi

# Check GPU device access in container
podman exec tdarr-node-gpu nvidia-smi

# Verify NVENC encoder availability
podman exec tdarr-node-gpu ffmpeg -encoders | grep nvenc

Plugin System Issues

Plugin Loading Failures

Troubleshooting Steps:

  1. Check plugin directory: Ensure no custom mounts override community plugins
  2. Verify dependencies: FlowHelper files (metadataUtils.js, letterboxUtils.js)
  3. Test plugin syntax:
    # Test plugin in Node.js
    node -e "require('./path/to/plugin.js')"
    

Custom Plugin Integration

Safe Integration Pattern:

  1. Selective mounting: Mount only specific required plugins
  2. Dependency verification: Include all FlowHelper dependencies
  3. Version compatibility: Ensure plugins match Tdarr version
  4. Null-safety checks: Add || [] to forEach operations

Monitoring and Logging

Log Locations

# Scheduler logs
tail -f /tmp/tdarr-scheduler.log

# Monitor logs  
tail -f /tmp/tdarr-monitor/monitor.log

# Server logs
ssh tdarr "docker logs tdarr"

# Node logs
podman logs tdarr-node-gpu

Discord Notification Issues

Check webhook configuration:

# Test Discord webhook
curl -X POST [WEBHOOK_URL] \
  -H "Content-Type: application/json" \
  -d '{"content": "Test message"}'

Common Issues:

  • JSON escaping in message content
  • Markdown formatting in Discord
  • User ping placement (outside code blocks)

Emergency Recovery

Complete System Reset

# Stop all containers
podman stop tdarr-node-gpu
ssh tdarr "docker stop tdarr"

# Clean cache directories
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir*

# Remove scheduler
crontab -e  # Delete tdarr lines

# Restart with clean configuration
./start-tdarr-gpu-podman-clean.sh
./tdarr-schedule-manager.sh preset work-safe
./tdarr-schedule-manager.sh install

Data Recovery

Important: Tdarr processes files in-place, original files remain untouched

  • Queue data: Stored in server configuration (/app/configs)
  • Progress data: Lost on container restart (unmapped nodes)
  • Cache files: Safe to delete, will re-download

Common Error Patterns

"Copy failed" in Staging Section

Cause: Network timeout during file transfer to unmapped node Solution: Monitoring system automatically retries

"ENOTEMPTY" Directory Cleanup Errors

Cause: Partial downloads leave files in work directories Solution: Force remove directories, monitoring handles automatically

Node Disconnection During Processing

Cause: Gaming detection or manual stop during active job Result: File returns to queue automatically, safe to restart

Prevention Best Practices

  1. Use unmapped node architecture for stability
  2. Implement monitoring system for automatic cleanup
  3. Configure gaming-aware scheduling for desktop systems
  4. Set container resource limits to prevent crashes
  5. Use clean plugin installation to avoid forEach errors
  6. Monitor system resources during intensive operations

This troubleshooting guide covers the most common issues and their resolutions for production Tdarr deployments.