claude-home/tdarr/troubleshooting.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

11 KiB

title description type domain tags
Tdarr Troubleshooting Guide Solutions for common Tdarr issues: forEach plugin errors, staging timeouts, kernel crashes, gaming detection, node registration, GPU utilization, DB requeue workarounds, flow plugin bugs (subtitle disposition, commentary track filtering), and Roku playback hangs. troubleshooting tdarr
tdarr
troubleshooting
ffmpeg
flow-plugin
sqlite
roku
jellyfin
nvenc
cifs

Tdarr Troubleshooting Guide

forEach Error Resolution

Problem: TypeError: Cannot read properties of undefined (reading 'forEach')

Symptoms: Scanning phase fails at "Tagging video res" step, preventing all transcodes Root Cause: Custom plugin mounts override community plugins with incompatible versions

Solution: Clean Plugin Installation

  1. Remove custom plugin mounts from docker-compose.yml
  2. Force plugin regeneration:
    ssh tdarr "docker restart tdarr"
    podman restart tdarr-node-gpu
    
  3. Verify clean plugins: Check for null-safety fixes (streams || []).forEach()

Plugin Safety Patterns

// ❌ Unsafe - causes forEach errors
args.variables.ffmpegCommand.streams.forEach()

// ✅ Safe - null-safe forEach
(args.variables.ffmpegCommand.streams || []).forEach()

Staging Section Timeout Issues

Problem: Files removed from staging after 300 seconds

Symptoms:

  • .tmp files stuck in work directories
  • ENOTEMPTY errors during cleanup
  • Subsequent jobs blocked

Solution: Automated Monitoring System

Monitor Script: /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

Automatic Actions:

  • Detects staging timeouts every 20 minutes
  • Removes stuck work directories
  • Sends Discord notifications
  • Logs all cleanup activities

Manual Cleanup Commands

# Check staging section
ssh tdarr "docker logs tdarr | tail -50"

# Find stuck work directories
find /mnt/NV2/tdarr-cache -name "tdarr-workDir*" -type d

# Force cleanup stuck directory
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir-[ID]

System Stability Issues

Problem: Kernel crashes during intensive transcoding

Root Cause: CIFS network issues during large file streaming (mapped nodes)

Solution: Convert to Unmapped Node Architecture

  1. Enable unmapped nodes in server Options
  2. Update node configuration:
    # Add to container environment
    -e nodeType=unmapped
    -e unmappedNodeCache=/cache
    
    # Use local cache volume
    -v "/mnt/NV2/tdarr-cache:/cache"
    
    # Remove media volume (no longer needed)
    
  3. Benefits: Eliminates CIFS streaming, prevents kernel crashes

Container Resource Limits

# Prevent memory exhaustion
deploy:
  resources:
    limits:
      memory: 8G
      cpus: '6'

Gaming Detection Issues

Problem: Tdarr doesn't stop during gaming

Check gaming detection:

# Test current gaming detection
./tdarr-schedule-manager.sh test

# View scheduler logs
tail -f /tmp/tdarr-scheduler.log

# Verify GPU usage detection
nvidia-smi

Gaming Process Detection

Monitored Processes:

  • Steam, Lutris, Heroic Games Launcher
  • Wine, Bottles (Windows compatibility)
  • GameMode, MangoHUD (utilities)
  • GPU usage >15% (configurable threshold)

Configuration Adjustments

# Edit gaming detection threshold
./tdarr-schedule-manager.sh edit

# Apply preset configurations
./tdarr-schedule-manager.sh preset gaming-only  # No time limits
./tdarr-schedule-manager.sh preset night-only   # 10PM-7AM only

Network and Access Issues

Server Connection Problems

Server Access Commands:

# SSH to Tdarr server
ssh tdarr

# Check server status
ssh tdarr "docker ps | grep tdarr"

# View server logs
ssh tdarr "docker logs tdarr"

# Access server container
ssh tdarr "docker exec -it tdarr /bin/bash"

Node Registration Issues

# Check node logs
podman logs tdarr-node-gpu

# Verify node registration
# Look for "Node registered" in server logs
ssh tdarr "docker logs tdarr | grep -i node"

# Test node connectivity
curl http://10.10.0.43:8265/api/v2/status

Performance Issues

Slow Transcoding Performance

Diagnosis:

  1. Check cache location: Should be local NVMe, not network
  2. Verify unmapped mode: nodeType=unmapped in container
  3. Monitor I/O: iotop during transcoding

Expected Performance:

  • Mapped nodes: Constant SMB streaming (~100MB/s)
  • Unmapped nodes: Download once → Process locally → Upload once

GPU Utilization Problems

# Monitor GPU usage during transcoding
watch nvidia-smi

# Check GPU device access in container
podman exec tdarr-node-gpu nvidia-smi

# Verify NVENC encoder availability
podman exec tdarr-node-gpu ffmpeg -encoders | grep nvenc

Plugin System Issues

Plugin Loading Failures

Troubleshooting Steps:

  1. Check plugin directory: Ensure no custom mounts override community plugins
  2. Verify dependencies: FlowHelper files (metadataUtils.js, letterboxUtils.js)
  3. Test plugin syntax:
    # Test plugin in Node.js
    node -e "require('./path/to/plugin.js')"
    

Custom Plugin Integration

Safe Integration Pattern:

  1. Selective mounting: Mount only specific required plugins
  2. Dependency verification: Include all FlowHelper dependencies
  3. Version compatibility: Ensure plugins match Tdarr version
  4. Null-safety checks: Add || [] to forEach operations

Monitoring and Logging

Log Locations

# Scheduler logs
tail -f /tmp/tdarr-scheduler.log

# Monitor logs  
tail -f /tmp/tdarr-monitor/monitor.log

# Server logs
ssh tdarr "docker logs tdarr"

# Node logs
podman logs tdarr-node-gpu

Discord Notification Issues

Check webhook configuration:

# Test Discord webhook
curl -X POST [WEBHOOK_URL] \
  -H "Content-Type: application/json" \
  -d '{"content": "Test message"}'

Common Issues:

  • JSON escaping in message content
  • Markdown formatting in Discord
  • User ping placement (outside code blocks)

Emergency Recovery

Complete System Reset

# Stop all containers
podman stop tdarr-node-gpu
ssh tdarr "docker stop tdarr"

# Clean cache directories
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir*

# Remove scheduler
crontab -e  # Delete tdarr lines

# Restart with clean configuration
./start-tdarr-gpu-podman-clean.sh
./tdarr-schedule-manager.sh preset work-safe
./tdarr-schedule-manager.sh install

Data Recovery

Important: Tdarr processes files in-place, original files remain untouched

  • Queue data: Stored in server configuration (/app/configs)
  • Progress data: Lost on container restart (unmapped nodes)
  • Cache files: Safe to delete, will re-download

Database Modification & Requeue

Problem: UI "Requeue All" Button Has No Effect

Symptoms: Clicking "Requeue all items (transcode)" in library UI does nothing

Workaround: Modify SQLite DB directly, then trigger scan:

# 1. Reset file statuses in DB (run Python on manticore)
python3 -c "
import sqlite3
conn = sqlite3.connect('/home/cal/docker/tdarr/server-data/Tdarr/DB2/SQL/database.db')
conn.execute(\"UPDATE filejsondb SET json_data = json_set(json_data, '$.TranscodeDecisionMaker', '') WHERE json_extract(json_data, '$.DB') = '<LIBRARY_ID>'\")
conn.commit()
conn.close()
"

# 2. Restart Tdarr
cd /home/cal/docker/tdarr && docker compose down && docker compose up -d

# 3. Trigger scan (required — DB changes alone won't queue files)
curl -s -X POST "http://localhost:8265/api/v2/scan-files" \
  -H "Content-Type: application/json" \
  -d '{"data":{"scanConfig":{"dbID":"<LIBRARY_ID>","arrayOrPath":"/media/Movies/","mode":"scanFindNew"}}}'

Library IDs: Movies=ZWgKkmzJp, TV Shows=EjfWXCdU8

Note: The CRUD API (/api/v2/cruddb) silently ignores write operations (update/insert/upsert all return 200 but don't persist). Always modify the SQLite DB directly.

Problem: Library filterCodecsSkip Blocks Flow Plugins

Symptoms: Job report shows "File video_codec_name (hevc) is in ignored codecs" Cause: filterCodecsSkip: "hevc" in library settings skips files before the flow runs Solution: Clear the filter in DB — the flow's own logic handles codec decisions:

# In librarysettingsjsondb, set filterCodecsSkip to empty string

Flow Plugin Issues

Problem: clrSubDef Disposition Change Not Persisting (SRT→ASS Re-encode)

Symptoms: Job log shows "Clearing default flag from subtitle stream" but output file still has default subtitle. SRT subtitles become ASS in output. Root Cause: The clrSubDef custom function pushed -disposition:{outputIndex} 0 to outputArgs without also specifying -c:{outputIndex} copy. Tdarr's Execute plugin skips adding default -c:N copy for streams with custom outputArgs. Without a codec spec, ffmpeg re-encodes SRT→ASS (MKV default), resetting the disposition. Fix: Always include codec copy when adding outputArgs:

// WRONG - causes re-encode
stream.outputArgs.push('-disposition:{outputIndex}', '0');
// RIGHT - preserves codec, changes only disposition
stream.outputArgs.push('-c:{outputIndex}', 'copy', '-disposition:{outputIndex}', '0');

Problem: ensAC3str Matches Commentary Tracks as Existing AC3 Stereo

Symptoms: File has commentary AC3 2ch track but no main-audio AC3 stereo. Plugin logs "File already has en stream in ac3, 2 channels". Root Cause: The community ffmpegCommandEnsureAudioStream plugin doesn't filter by track title — any AC3 2ch eng track satisfies the check, including commentary. Fix: Replaced with customFunction that filters out tracks with "commentary" in the title tag before checking. Updated in flow KeayMCz5Y via direct SQLite modification.

Combined Impact: Roku Playback Hang

When both bugs occur together (TrueHD default audio + default subtitle not cleared), Jellyfin must transcode audio AND burn-in subtitles simultaneously over HLS. The ~30s startup delay causes Roku to timeout at ~33% loading. Fixing either bug alone unblocks playback — clearing the subtitle default is sufficient since TrueHD-only transcoding is fast enough.

Common Error Patterns

"Copy failed" in Staging Section

Cause: Network timeout during file transfer to unmapped node Solution: Monitoring system automatically retries

"ENOTEMPTY" Directory Cleanup Errors

Cause: Partial downloads leave files in work directories Solution: Force remove directories, monitoring handles automatically

Node Disconnection During Processing

Cause: Gaming detection or manual stop during active job Result: File returns to queue automatically, safe to restart

Prevention Best Practices

  1. Use unmapped node architecture for stability
  2. Implement monitoring system for automatic cleanup
  3. Configure gaming-aware scheduling for desktop systems
  4. Set container resource limits to prevent crashes
  5. Use clean plugin installation to avoid forEach errors
  6. Monitor system resources during intensive operations

This troubleshooting guide covers the most common issues and their resolutions for production Tdarr deployments.