Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
272 lines
7.2 KiB
Markdown
272 lines
7.2 KiB
Markdown
# Tdarr Troubleshooting Guide
|
|
|
|
## forEach Error Resolution
|
|
|
|
### Problem: TypeError: Cannot read properties of undefined (reading 'forEach')
|
|
**Symptoms**: Scanning phase fails at "Tagging video res" step, preventing all transcodes
|
|
**Root Cause**: Custom plugin mounts override community plugins with incompatible versions
|
|
|
|
### Solution: Clean Plugin Installation
|
|
1. **Remove custom plugin mounts** from docker-compose.yml
|
|
2. **Force plugin regeneration**:
|
|
```bash
|
|
ssh tdarr "docker restart tdarr"
|
|
podman restart tdarr-node-gpu
|
|
```
|
|
3. **Verify clean plugins**: Check for null-safety fixes `(streams || []).forEach()`
|
|
|
|
### Plugin Safety Patterns
|
|
```javascript
|
|
// ❌ Unsafe - causes forEach errors
|
|
args.variables.ffmpegCommand.streams.forEach()
|
|
|
|
// ✅ Safe - null-safe forEach
|
|
(args.variables.ffmpegCommand.streams || []).forEach()
|
|
```
|
|
|
|
## Staging Section Timeout Issues
|
|
|
|
### Problem: Files removed from staging after 300 seconds
|
|
**Symptoms**:
|
|
- `.tmp` files stuck in work directories
|
|
- ENOTEMPTY errors during cleanup
|
|
- Subsequent jobs blocked
|
|
|
|
### Solution: Automated Monitoring System
|
|
**Monitor Script**: `/mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh`
|
|
|
|
**Automatic Actions**:
|
|
- Detects staging timeouts every 20 minutes
|
|
- Removes stuck work directories
|
|
- Sends Discord notifications
|
|
- Logs all cleanup activities
|
|
|
|
### Manual Cleanup Commands
|
|
```bash
|
|
# Check staging section
|
|
ssh tdarr "docker logs tdarr | tail -50"
|
|
|
|
# Find stuck work directories
|
|
find /mnt/NV2/tdarr-cache -name "tdarr-workDir*" -type d
|
|
|
|
# Force cleanup stuck directory
|
|
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir-[ID]
|
|
```
|
|
|
|
## System Stability Issues
|
|
|
|
### Problem: Kernel crashes during intensive transcoding
|
|
**Root Cause**: CIFS network issues during large file streaming (mapped nodes)
|
|
|
|
### Solution: Convert to Unmapped Node Architecture
|
|
1. **Enable unmapped nodes** in server Options
|
|
2. **Update node configuration**:
|
|
```bash
|
|
# Add to container environment
|
|
-e nodeType=unmapped
|
|
-e unmappedNodeCache=/cache
|
|
|
|
# Use local cache volume
|
|
-v "/mnt/NV2/tdarr-cache:/cache"
|
|
|
|
# Remove media volume (no longer needed)
|
|
```
|
|
3. **Benefits**: Eliminates CIFS streaming, prevents kernel crashes
|
|
|
|
### Container Resource Limits
|
|
```yaml
|
|
# Prevent memory exhaustion
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 8G
|
|
cpus: '6'
|
|
```
|
|
|
|
## Gaming Detection Issues
|
|
|
|
### Problem: Tdarr doesn't stop during gaming
|
|
**Check gaming detection**:
|
|
```bash
|
|
# Test current gaming detection
|
|
./tdarr-schedule-manager.sh test
|
|
|
|
# View scheduler logs
|
|
tail -f /tmp/tdarr-scheduler.log
|
|
|
|
# Verify GPU usage detection
|
|
nvidia-smi
|
|
```
|
|
|
|
### Gaming Process Detection
|
|
**Monitored Processes**:
|
|
- Steam, Lutris, Heroic Games Launcher
|
|
- Wine, Bottles (Windows compatibility)
|
|
- GameMode, MangoHUD (utilities)
|
|
- **GPU usage >15%** (configurable threshold)
|
|
|
|
### Configuration Adjustments
|
|
```bash
|
|
# Edit gaming detection threshold
|
|
./tdarr-schedule-manager.sh edit
|
|
|
|
# Apply preset configurations
|
|
./tdarr-schedule-manager.sh preset gaming-only # No time limits
|
|
./tdarr-schedule-manager.sh preset night-only # 10PM-7AM only
|
|
```
|
|
|
|
## Network and Access Issues
|
|
|
|
### Server Connection Problems
|
|
**Server Access Commands**:
|
|
```bash
|
|
# SSH to Tdarr server
|
|
ssh tdarr
|
|
|
|
# Check server status
|
|
ssh tdarr "docker ps | grep tdarr"
|
|
|
|
# View server logs
|
|
ssh tdarr "docker logs tdarr"
|
|
|
|
# Access server container
|
|
ssh tdarr "docker exec -it tdarr /bin/bash"
|
|
```
|
|
|
|
### Node Registration Issues
|
|
```bash
|
|
# Check node logs
|
|
podman logs tdarr-node-gpu
|
|
|
|
# Verify node registration
|
|
# Look for "Node registered" in server logs
|
|
ssh tdarr "docker logs tdarr | grep -i node"
|
|
|
|
# Test node connectivity
|
|
curl http://10.10.0.43:8265/api/v2/status
|
|
```
|
|
|
|
## Performance Issues
|
|
|
|
### Slow Transcoding Performance
|
|
**Diagnosis**:
|
|
1. **Check cache location**: Should be local NVMe, not network
|
|
2. **Verify unmapped mode**: `nodeType=unmapped` in container
|
|
3. **Monitor I/O**: `iotop` during transcoding
|
|
|
|
**Expected Performance**:
|
|
- **Mapped nodes**: Constant SMB streaming (~100MB/s)
|
|
- **Unmapped nodes**: Download once → Process locally → Upload once
|
|
|
|
### GPU Utilization Problems
|
|
```bash
|
|
# Monitor GPU usage during transcoding
|
|
watch nvidia-smi
|
|
|
|
# Check GPU device access in container
|
|
podman exec tdarr-node-gpu nvidia-smi
|
|
|
|
# Verify NVENC encoder availability
|
|
podman exec tdarr-node-gpu ffmpeg -encoders | grep nvenc
|
|
```
|
|
|
|
## Plugin System Issues
|
|
|
|
### Plugin Loading Failures
|
|
**Troubleshooting Steps**:
|
|
1. **Check plugin directory**: Ensure no custom mounts override community plugins
|
|
2. **Verify dependencies**: FlowHelper files (`metadataUtils.js`, `letterboxUtils.js`)
|
|
3. **Test plugin syntax**:
|
|
```bash
|
|
# Test plugin in Node.js
|
|
node -e "require('./path/to/plugin.js')"
|
|
```
|
|
|
|
### Custom Plugin Integration
|
|
**Safe Integration Pattern**:
|
|
1. **Selective mounting**: Mount only specific required plugins
|
|
2. **Dependency verification**: Include all FlowHelper dependencies
|
|
3. **Version compatibility**: Ensure plugins match Tdarr version
|
|
4. **Null-safety checks**: Add `|| []` to forEach operations
|
|
|
|
## Monitoring and Logging
|
|
|
|
### Log Locations
|
|
```bash
|
|
# Scheduler logs
|
|
tail -f /tmp/tdarr-scheduler.log
|
|
|
|
# Monitor logs
|
|
tail -f /tmp/tdarr-monitor/monitor.log
|
|
|
|
# Server logs
|
|
ssh tdarr "docker logs tdarr"
|
|
|
|
# Node logs
|
|
podman logs tdarr-node-gpu
|
|
```
|
|
|
|
### Discord Notification Issues
|
|
**Check webhook configuration**:
|
|
```bash
|
|
# Test Discord webhook
|
|
curl -X POST [WEBHOOK_URL] \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"content": "Test message"}'
|
|
```
|
|
|
|
**Common Issues**:
|
|
- JSON escaping in message content
|
|
- Markdown formatting in Discord
|
|
- User ping placement (outside code blocks)
|
|
|
|
## Emergency Recovery
|
|
|
|
### Complete System Reset
|
|
```bash
|
|
# Stop all containers
|
|
podman stop tdarr-node-gpu
|
|
ssh tdarr "docker stop tdarr"
|
|
|
|
# Clean cache directories
|
|
rm -rf /mnt/NV2/tdarr-cache/tdarr-workDir*
|
|
|
|
# Remove scheduler
|
|
crontab -e # Delete tdarr lines
|
|
|
|
# Restart with clean configuration
|
|
./start-tdarr-gpu-podman-clean.sh
|
|
./tdarr-schedule-manager.sh preset work-safe
|
|
./tdarr-schedule-manager.sh install
|
|
```
|
|
|
|
### Data Recovery
|
|
**Important**: Tdarr processes files in-place, original files remain untouched
|
|
- **Queue data**: Stored in server configuration (`/app/configs`)
|
|
- **Progress data**: Lost on container restart (unmapped nodes)
|
|
- **Cache files**: Safe to delete, will re-download
|
|
|
|
## Common Error Patterns
|
|
|
|
### "Copy failed" in Staging Section
|
|
**Cause**: Network timeout during file transfer to unmapped node
|
|
**Solution**: Monitoring system automatically retries
|
|
|
|
### "ENOTEMPTY" Directory Cleanup Errors
|
|
**Cause**: Partial downloads leave files in work directories
|
|
**Solution**: Force remove directories, monitoring handles automatically
|
|
|
|
### Node Disconnection During Processing
|
|
**Cause**: Gaming detection or manual stop during active job
|
|
**Result**: File returns to queue automatically, safe to restart
|
|
|
|
## Prevention Best Practices
|
|
|
|
1. **Use unmapped node architecture** for stability
|
|
2. **Implement monitoring system** for automatic cleanup
|
|
3. **Configure gaming-aware scheduling** for desktop systems
|
|
4. **Set container resource limits** to prevent crashes
|
|
5. **Use clean plugin installation** to avoid forEach errors
|
|
6. **Monitor system resources** during intensive operations
|
|
|
|
This troubleshooting guide covers the most common issues and their resolutions for production Tdarr deployments. |