All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
338 lines
11 KiB
Markdown
338 lines
11 KiB
Markdown
---
|
|
title: "Tdarr Troubleshooting Guide"
|
|
description: "Solutions for common Tdarr issues: forEach plugin errors, staging timeouts, kernel crashes, gaming detection, node registration, GPU utilization, DB requeue workarounds, flow plugin bugs (subtitle disposition, commentary track filtering), and Roku playback hangs."
|
|
type: troubleshooting
|
|
domain: tdarr
|
|
tags: [tdarr, troubleshooting, ffmpeg, flow-plugin, sqlite, roku, jellyfin, nvenc, cifs]
|
|
---
|
|
|
|
# Tdarr Troubleshooting Guide
|
|
|
|
## forEach Error Resolution
|
|
|
|
### Problem: TypeError: Cannot read properties of undefined (reading 'forEach')
|
|
**Symptoms**: Scanning phase fails at "Tagging video res" step, preventing all transcodes
|
|
**Root Cause**: Custom plugin mounts override community plugins with incompatible versions
|
|
|
|
### Solution: Clean Plugin Installation
|
|
1. **Remove custom plugin mounts** from docker-compose.yml
|
|
2. **Force plugin regeneration**:
|
|
```bash
|
|
ssh tdarr "docker restart tdarr"
|
|
podman restart tdarr-node-gpu
|
|
```
|
|
3. **Verify clean plugins**: Check for null-safety fixes `(streams || []).forEach()`
|
|
|
|
### Plugin Safety Patterns
|
|
```javascript
|
|
// ❌ Unsafe - causes forEach errors
|
|
args.variables.ffmpegCommand.streams.forEach()
|
|
|
|
// ✅ Safe - null-safe forEach
|
|
(args.variables.ffmpegCommand.streams || []).forEach()
|
|
```
|
|
|
|
## Staging Section Timeout Issues
|
|
|
|
### Problem: Files removed from staging after 300 seconds
|
|
**Symptoms**:
|
|
- `.tmp` files stuck in work directories
|
|
- ENOTEMPTY errors during cleanup
|
|
- Subsequent jobs blocked
|
|
|
|
### Solution: Automated Monitoring System
|
|
**Monitor Script**: `/mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh`
|
|
|
|
**Automatic Actions**:
|
|
- Detects staging timeouts every 20 minutes
|
|
- Removes stuck work directories
|
|
- Sends Discord notifications
|
|
- Logs all cleanup activities
|
|
|
|
### Manual Cleanup Commands
|
|
```bash
|
|
# Check staging section
|
|
ssh tdarr "docker logs tdarr | tail -50"
|
|
|
|
# Find stuck work directories
|
|
find /mnt/NV2/tdarr-cache -name "tdarr-workDir*" -type d
|
|
|
|
# Force cleanup stuck directory
|
|
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir-[ID]
|
|
```
|
|
|
|
## System Stability Issues
|
|
|
|
### Problem: Kernel crashes during intensive transcoding
|
|
**Root Cause**: CIFS network issues during large file streaming (mapped nodes)
|
|
|
|
### Solution: Convert to Unmapped Node Architecture
|
|
1. **Enable unmapped nodes** in server Options
|
|
2. **Update node configuration**:
|
|
```bash
|
|
# Add to container environment
|
|
-e nodeType=unmapped
|
|
-e unmappedNodeCache=/cache
|
|
|
|
# Use local cache volume
|
|
-v "/mnt/NV2/tdarr-cache:/cache"
|
|
|
|
# Remove media volume (no longer needed)
|
|
```
|
|
3. **Benefits**: Eliminates CIFS streaming, prevents kernel crashes
|
|
|
|
### Container Resource Limits
|
|
```yaml
|
|
# Prevent memory exhaustion
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 8G
|
|
cpus: '6'
|
|
```
|
|
|
|
## Gaming Detection Issues
|
|
|
|
### Problem: Tdarr doesn't stop during gaming
|
|
**Check gaming detection**:
|
|
```bash
|
|
# Test current gaming detection
|
|
./tdarr-schedule-manager.sh test
|
|
|
|
# View scheduler logs
|
|
tail -f /tmp/tdarr-scheduler.log
|
|
|
|
# Verify GPU usage detection
|
|
nvidia-smi
|
|
```
|
|
|
|
### Gaming Process Detection
|
|
**Monitored Processes**:
|
|
- Steam, Lutris, Heroic Games Launcher
|
|
- Wine, Bottles (Windows compatibility)
|
|
- GameMode, MangoHUD (utilities)
|
|
- **GPU usage >15%** (configurable threshold)
|
|
|
|
### Configuration Adjustments
|
|
```bash
|
|
# Edit gaming detection threshold
|
|
./tdarr-schedule-manager.sh edit
|
|
|
|
# Apply preset configurations
|
|
./tdarr-schedule-manager.sh preset gaming-only # No time limits
|
|
./tdarr-schedule-manager.sh preset night-only # 10PM-7AM only
|
|
```
|
|
|
|
## Network and Access Issues
|
|
|
|
### Server Connection Problems
|
|
**Server Access Commands**:
|
|
```bash
|
|
# SSH to Tdarr server
|
|
ssh tdarr
|
|
|
|
# Check server status
|
|
ssh tdarr "docker ps | grep tdarr"
|
|
|
|
# View server logs
|
|
ssh tdarr "docker logs tdarr"
|
|
|
|
# Access server container
|
|
ssh tdarr "docker exec -it tdarr /bin/bash"
|
|
```
|
|
|
|
### Node Registration Issues
|
|
```bash
|
|
# Check node logs
|
|
podman logs tdarr-node-gpu
|
|
|
|
# Verify node registration
|
|
# Look for "Node registered" in server logs
|
|
ssh tdarr "docker logs tdarr | grep -i node"
|
|
|
|
# Test node connectivity
|
|
curl http://10.10.0.43:8265/api/v2/status
|
|
```
|
|
|
|
## Performance Issues
|
|
|
|
### Slow Transcoding Performance
|
|
**Diagnosis**:
|
|
1. **Check cache location**: Should be local NVMe, not network
|
|
2. **Verify unmapped mode**: `nodeType=unmapped` in container
|
|
3. **Monitor I/O**: `iotop` during transcoding
|
|
|
|
**Expected Performance**:
|
|
- **Mapped nodes**: Constant SMB streaming (~100MB/s)
|
|
- **Unmapped nodes**: Download once → Process locally → Upload once
|
|
|
|
### GPU Utilization Problems
|
|
```bash
|
|
# Monitor GPU usage during transcoding
|
|
watch nvidia-smi
|
|
|
|
# Check GPU device access in container
|
|
podman exec tdarr-node-gpu nvidia-smi
|
|
|
|
# Verify NVENC encoder availability
|
|
podman exec tdarr-node-gpu ffmpeg -encoders | grep nvenc
|
|
```
|
|
|
|
## Plugin System Issues
|
|
|
|
### Plugin Loading Failures
|
|
**Troubleshooting Steps**:
|
|
1. **Check plugin directory**: Ensure no custom mounts override community plugins
|
|
2. **Verify dependencies**: FlowHelper files (`metadataUtils.js`, `letterboxUtils.js`)
|
|
3. **Test plugin syntax**:
|
|
```bash
|
|
# Test plugin in Node.js
|
|
node -e "require('./path/to/plugin.js')"
|
|
```
|
|
|
|
### Custom Plugin Integration
|
|
**Safe Integration Pattern**:
|
|
1. **Selective mounting**: Mount only specific required plugins
|
|
2. **Dependency verification**: Include all FlowHelper dependencies
|
|
3. **Version compatibility**: Ensure plugins match Tdarr version
|
|
4. **Null-safety checks**: Add `|| []` to forEach operations
|
|
|
|
## Monitoring and Logging
|
|
|
|
### Log Locations
|
|
```bash
|
|
# Scheduler logs
|
|
tail -f /tmp/tdarr-scheduler.log
|
|
|
|
# Monitor logs
|
|
tail -f /tmp/tdarr-monitor/monitor.log
|
|
|
|
# Server logs
|
|
ssh tdarr "docker logs tdarr"
|
|
|
|
# Node logs
|
|
podman logs tdarr-node-gpu
|
|
```
|
|
|
|
### Discord Notification Issues
|
|
**Check webhook configuration**:
|
|
```bash
|
|
# Test Discord webhook
|
|
curl -X POST [WEBHOOK_URL] \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"content": "Test message"}'
|
|
```
|
|
|
|
**Common Issues**:
|
|
- JSON escaping in message content
|
|
- Markdown formatting in Discord
|
|
- User ping placement (outside code blocks)
|
|
|
|
## Emergency Recovery
|
|
|
|
### Complete System Reset
|
|
```bash
|
|
# Stop all containers
|
|
podman stop tdarr-node-gpu
|
|
ssh tdarr "docker stop tdarr"
|
|
|
|
# Clean cache directories
|
|
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir*
|
|
|
|
# Remove scheduler
|
|
crontab -e # Delete tdarr lines
|
|
|
|
# Restart with clean configuration
|
|
./start-tdarr-gpu-podman-clean.sh
|
|
./tdarr-schedule-manager.sh preset work-safe
|
|
./tdarr-schedule-manager.sh install
|
|
```
|
|
|
|
### Data Recovery
|
|
**Important**: Tdarr processes files in-place, original files remain untouched
|
|
- **Queue data**: Stored in server configuration (`/app/configs`)
|
|
- **Progress data**: Lost on container restart (unmapped nodes)
|
|
- **Cache files**: Safe to delete, will re-download
|
|
|
|
## Database Modification & Requeue
|
|
|
|
### Problem: UI "Requeue All" Button Has No Effect
|
|
**Symptoms**: Clicking "Requeue all items (transcode)" in library UI does nothing
|
|
|
|
**Workaround**: Modify SQLite DB directly, then trigger scan:
|
|
```bash
|
|
# 1. Reset file statuses in DB (run Python on manticore)
|
|
python3 -c "
|
|
import sqlite3
|
|
conn = sqlite3.connect('/home/cal/docker/tdarr/server-data/Tdarr/DB2/SQL/database.db')
|
|
conn.execute(\"UPDATE filejsondb SET json_data = json_set(json_data, '$.TranscodeDecisionMaker', '') WHERE json_extract(json_data, '$.DB') = '<LIBRARY_ID>'\")
|
|
conn.commit()
|
|
conn.close()
|
|
"
|
|
|
|
# 2. Restart Tdarr
|
|
cd /home/cal/docker/tdarr && docker compose down && docker compose up -d
|
|
|
|
# 3. Trigger scan (required — DB changes alone won't queue files)
|
|
curl -s -X POST "http://localhost:8265/api/v2/scan-files" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"data":{"scanConfig":{"dbID":"<LIBRARY_ID>","arrayOrPath":"/media/Movies/","mode":"scanFindNew"}}}'
|
|
```
|
|
|
|
**Library IDs**: Movies=`ZWgKkmzJp`, TV Shows=`EjfWXCdU8`
|
|
|
|
**Note**: The CRUD API (`/api/v2/cruddb`) silently ignores write operations (update/insert/upsert all return 200 but don't persist). Always modify the SQLite DB directly.
|
|
|
|
### Problem: Library filterCodecsSkip Blocks Flow Plugins
|
|
**Symptoms**: Job report shows "File video_codec_name (hevc) is in ignored codecs"
|
|
**Cause**: `filterCodecsSkip: "hevc"` in library settings skips files before the flow runs
|
|
**Solution**: Clear the filter in DB — the flow's own logic handles codec decisions:
|
|
```bash
|
|
# In librarysettingsjsondb, set filterCodecsSkip to empty string
|
|
```
|
|
|
|
## Flow Plugin Issues
|
|
|
|
### Problem: clrSubDef Disposition Change Not Persisting (SRT→ASS Re-encode)
|
|
**Symptoms**: Job log shows "Clearing default flag from subtitle stream" but output file still has default subtitle. SRT subtitles become ASS in output.
|
|
**Root Cause**: The `clrSubDef` custom function pushed `-disposition:{outputIndex} 0` to `outputArgs` without also specifying `-c:{outputIndex} copy`. Tdarr's Execute plugin skips adding default `-c:N copy` for streams with custom `outputArgs`. Without a codec spec, ffmpeg re-encodes SRT→ASS (MKV default), resetting the disposition.
|
|
**Fix**: Always include codec copy when adding outputArgs:
|
|
```javascript
|
|
// WRONG - causes re-encode
|
|
stream.outputArgs.push('-disposition:{outputIndex}', '0');
|
|
// RIGHT - preserves codec, changes only disposition
|
|
stream.outputArgs.push('-c:{outputIndex}', 'copy', '-disposition:{outputIndex}', '0');
|
|
```
|
|
|
|
### Problem: ensAC3str Matches Commentary Tracks as Existing AC3 Stereo
|
|
**Symptoms**: File has commentary AC3 2ch track but no main-audio AC3 stereo. Plugin logs "File already has en stream in ac3, 2 channels".
|
|
**Root Cause**: The community `ffmpegCommandEnsureAudioStream` plugin doesn't filter by track title — any AC3 2ch eng track satisfies the check, including commentary.
|
|
**Fix**: Replaced with `customFunction` that filters out tracks with "commentary" in the title tag before checking. Updated in flow `KeayMCz5Y` via direct SQLite modification.
|
|
|
|
### Combined Impact: Roku Playback Hang
|
|
When both bugs occur together (TrueHD default audio + default subtitle not cleared), Jellyfin must transcode audio AND burn-in subtitles simultaneously over HLS. The ~30s startup delay causes Roku to timeout at ~33% loading. Fixing either bug alone unblocks playback — clearing the subtitle default is sufficient since TrueHD-only transcoding is fast enough.
|
|
|
|
## Common Error Patterns
|
|
|
|
### "Copy failed" in Staging Section
|
|
**Cause**: Network timeout during file transfer to unmapped node
|
|
**Solution**: Monitoring system automatically retries
|
|
|
|
### "ENOTEMPTY" Directory Cleanup Errors
|
|
**Cause**: Partial downloads leave files in work directories
|
|
**Solution**: Force remove directories, monitoring handles automatically
|
|
|
|
### Node Disconnection During Processing
|
|
**Cause**: Gaming detection or manual stop during active job
|
|
**Result**: File returns to queue automatically, safe to restart
|
|
|
|
## Prevention Best Practices
|
|
|
|
1. **Use unmapped node architecture** for stability
|
|
2. **Implement monitoring system** for automatic cleanup
|
|
3. **Configure gaming-aware scheduling** for desktop systems
|
|
4. **Set container resource limits** to prevent crashes
|
|
5. **Use clean plugin installation** to avoid forEach errors
|
|
6. **Monitor system resources** during intensive operations
|
|
|
|
This troubleshooting guide covers the most common issues and their resolutions for production Tdarr deployments. |