claude-home/media-servers/troubleshooting.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

562 lines
13 KiB
Markdown

---
title: "Media Servers Troubleshooting"
description: "Troubleshooting guide for Jellyfin media server issues including GPU transcoding failures, driver mismatches, container startup problems, network connectivity, Roku/Apple TV playback, and emergency recovery."
type: troubleshooting
domain: media-servers
tags: [jellyfin, nvidia, gpu, transcoding, docker, roku, troubleshooting, recovery]
---
# Media Servers - Troubleshooting Guide
## Common Issues and Solutions
### GPU Transcoding Problems
#### GPU Not Detected in Container
**Symptoms**:
- Jellyfin shows "No hardware acceleration available"
- Transcoding falls back to CPU (slow performance)
- Container logs show NVIDIA device not found
**Diagnosis**:
```bash
# Check GPU accessibility from container
docker exec jellyfin nvidia-smi
# Verify NVIDIA runtime is configured
docker info | grep -i nvidia
# Check container GPU configuration
docker inspect jellyfin | grep -i gpu
```
**Solutions**:
1. **Verify NVIDIA Container Runtime**:
```bash
# On host
nvidia-smi # Should work
# Install nvidia-container-toolkit if missing
sudo apt install nvidia-container-toolkit
sudo systemctl restart docker
```
2. **Fix Docker Compose Configuration**:
```yaml
services:
jellyfin:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
```
3. **Restart Container**:
```bash
docker compose down
docker compose up -d
```
#### Driver/Library Version Mismatch
**Symptoms**:
- `nvidia-smi` fails with "driver/library version mismatch"
- Container won't start with NVML error
- GPU monitoring shows "Restart failed"
**Cause**: NVIDIA driver updated on host but kernel modules not reloaded
**Solution**:
```bash
# Check host GPU status
nvidia-smi # Will fail with mismatch error
# Reboot required to reload kernel modules
sudo reboot
# After reboot, verify
nvidia-smi
docker exec jellyfin nvidia-smi
```
**Prevention**:
- See `/media-servers/jellyfin-ubuntu-manticore.md` NVIDIA Driver Management section
- Hold driver packages to prevent auto-updates
- Monitor for updates weekly via automated checks
#### Transcoding Starts Then Fails
**Symptoms**:
- Playback begins then stops
- Jellyfin logs show ffmpeg errors
- GPU memory errors in logs
**Diagnosis**:
```bash
# Check GPU memory usage
nvidia-smi
# Check for concurrent GPU users (Tdarr, other containers)
docker ps | grep -E "tdarr|jellyfin"
# Check Jellyfin transcode logs
docker logs jellyfin 2>&1 | grep -i transcode | tail -50
```
**Solutions**:
1. **GPU Resource Conflict**: If Tdarr is using GPU, pause transcoding or limit concurrent jobs
2. **Insufficient GPU Memory**:
```bash
# Check GPU memory
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
# Reduce Jellyfin transcode resolution or bitrate
```
3. **Codec Not Supported**: Verify codec is supported by GPU encoder
```bash
# Check available encoders
docker exec jellyfin ffmpeg -encoders 2>/dev/null | grep nvenc
```
### Container Startup Issues
#### Container Won't Start After Update
**Symptoms**:
- Container exits immediately after `docker compose up -d`
- Exit code indicates error (non-zero)
**Diagnosis**:
```bash
# Check container logs
docker logs jellyfin
# Check exit code
docker inspect jellyfin | grep ExitCode
# Try starting in foreground for detailed output
docker compose up
```
**Common Causes & Solutions**:
1. **Permission Issues**:
```bash
# Fix ownership of config/cache directories
sudo chown -R 1000:1000 ~/docker/jellyfin/config
sudo chown -R 1000:1000 /mnt/NV2/jellyfin-cache
```
2. **Port Already in Use**:
```bash
# Check if port 8096 is in use
sudo lsof -i :8096
# Kill conflicting process or change Jellyfin port
```
3. **Volume Mount Failures**:
```bash
# Verify all mount points exist and are accessible
ls -la ~/docker/jellyfin/config
ls -la /mnt/NV2/jellyfin-cache
mount | grep /mnt/truenas/media
```
#### Container Stuck in "Restarting" Loop
**Symptoms**:
- Docker shows container constantly restarting
- Brief uptime then crash
**Diagnosis**:
```bash
# Watch restart behavior
docker stats jellyfin
# Check logs for crash reason
docker logs jellyfin --tail 200
# Check resource limits
docker inspect jellyfin | grep -A 10 Resources
```
**Solutions**:
1. **Database Corruption**:
```bash
# Stop container
docker stop jellyfin
# Backup database
cp ~/docker/jellyfin/config/data/library.db{,.bak}
# Try recovery
sqlite3 ~/docker/jellyfin/config/data/library.db "PRAGMA integrity_check;"
```
2. **Configuration File Issue**:
```bash
# Rename config to force regeneration
mv ~/docker/jellyfin/config/system.xml{,.bak}
# Restart container
docker compose up -d
```
### Network & Connectivity
#### Can't Access Web Interface
**Symptoms**:
- http://10.10.0.226:8096 not responding
- Connection timeout or refused
**Diagnosis**:
```bash
# Check if container is running
docker ps | grep jellyfin
# Check port binding
docker port jellyfin
# Test local connectivity
curl -I http://localhost:8096
curl -I http://10.10.0.226:8096
# Check firewall
sudo ufw status | grep 8096
```
**Solutions**:
1. **Container Not Running**: Start container
```bash
docker compose up -d
```
2. **Port Not Bound Correctly**:
```yaml
# Fix docker-compose.yml
ports:
- "8096:8096" # Not "0.0.0.0:8096:8096" on some systems
```
3. **Firewall Blocking**:
```bash
sudo ufw allow 8096/tcp
```
#### Client Discovery Not Working
**Symptoms**:
- Jellyfin apps can't auto-discover server
- Must manually enter IP address
**Diagnosis**:
```bash
# Check UDP discovery port
docker port jellyfin | grep 7359
# Verify UDP traffic allowed
sudo ufw status | grep 7359
```
**Solution**:
```bash
# Ensure UDP port exposed
# In docker-compose.yml:
ports:
- "7359:7359/udp"
# Allow in firewall
sudo ufw allow 7359/udp
```
### Performance Issues
#### Slow Transcoding Performance
**Symptoms**:
- Buffering during playback
- High CPU usage despite GPU available
- Transcoding slower than real-time
**Diagnosis**:
```bash
# Check if GPU transcoding is actually being used
nvidia-smi dmon -s u -c 5 # Monitor GPU usage
# Check Jellyfin Dashboard > Playback for active transcodes
# Verify hardware accel is enabled in Jellyfin settings
```
**Solutions**:
1. **Hardware Acceleration Not Enabled**:
- Dashboard → Playback → Transcoding
- Select "NVIDIA NVENC"
- Enable desired codecs
2. **GPU Busy with Other Tasks**:
```bash
# Check what else is using GPU
nvidia-smi
# Pause Tdarr if running
docker stop tdarr-node-gpu
```
3. **Cache on Slow Storage**:
```bash
# Verify cache is on NVMe, not network storage
docker inspect jellyfin | grep -A 5 cache
# Should be /mnt/NV2/jellyfin-cache (NVMe)
# NOT /mnt/truenas/... (network)
```
#### High Memory Usage
**Symptoms**:
- Jellyfin using excessive RAM
- Server becomes unresponsive
- OOM (Out of Memory) errors
**Diagnosis**:
```bash
# Check memory usage
docker stats jellyfin
# Check for memory leaks in logs
docker logs jellyfin | grep -i memory
```
**Solutions**:
1. **Set Memory Limits**:
```yaml
# In docker-compose.yml
deploy:
resources:
limits:
memory: 4G
```
2. **Reduce Transcode Throttle**:
- Dashboard → Playback
- Lower "Throttle Transcodes" value
3. **Clear Transcode Cache**:
```bash
# Stop container
docker stop jellyfin
# Clear transcode cache
rm -rf /mnt/NV2/jellyfin-cache/transcodes/*
# Start container
docker start jellyfin
```
### Playback Problems
#### Playback Stuttering Despite Good Network
**Symptoms**:
- Video plays but stutters/buffers frequently
- Network speed is adequate
- Direct play works, transcoding stutters
**Solutions**:
1. **Check Transcode Quality Settings**:
- Lower bitrate in client settings
- Reduce resolution if needed
2. **Verify GPU Transcoding Active**:
```bash
# While playing, check GPU usage
nvidia-smi dmon -s u
# Should show encoder (enc) usage
```
3. **Check Storage I/O**:
```bash
# Monitor disk I/O during playback
iostat -x 2 5
```
#### Roku/Apple TV Playback Timeout (TrueHD/DTS-HD MA Audio)
**Symptoms**:
- Playback hangs at "Loading" for 20-30 seconds then fails on Roku
- Jellyfin logs show forced transcoding with subtitle extraction delay
- Works fine on web browser or mobile clients
**Root Cause**: File has incompatible default audio (TrueHD, DTS-HD MA, Opus) AND a default SRT subtitle. Jellyfin must transcode audio AND burn-in subtitles over HLS. The 27-second subtitle extraction delay causes Roku client timeout.
**Incompatible Audio Codecs** (Roku/Apple TV):
| Codec | Status |
|-------|--------|
| AC3 (Dolby Digital) | Native playback |
| AAC | Native playback |
| EAC3 (Dolby Digital+) | Native playback |
| TrueHD | Requires transcode |
| DTS / DTS-HD MA | Requires transcode |
| Opus | Requires transcode |
**Immediate Fix** (per-file with mkvpropedit):
```bash
# Clear subtitle default, set compatible audio as default
mkvprobedit "file.mkv" \
--edit track:s1 --set flag-default=0 \
--edit track:a1 --set flag-default=0 \
--edit track:a3 --set flag-default=1
```
**Systemic Fix**: Tdarr flow plugins `ensAC3str` (adds AC3 stereo fallback) and `clrSubDef` (clears non-forced subtitle defaults) — see `tdarr/CONTEXT.md`
#### Audio/Video Sync Issues
**Symptoms**:
- Audio and video out of sync during playback
**Solutions**:
1. **Enable Audio Passthrough** (if supported by client)
2. **Update ffmpeg** in container (usually handled by Jellyfin updates)
3. **Try Different Transcode Settings**:
- Disable subtitle burn-in if not needed
- Change audio codec settings
### Monitoring & Alerts
#### GPU Monitor Alerts Not Working
**Symptoms**:
- No Discord notifications when GPU issues occur
- Monitoring script seems to run but no alerts
**Diagnosis**:
```bash
# Test Discord webhook
python3 /home/cal/scripts/jellyfin_gpu_monitor.py --discord-test
# Check monitoring logs
tail -f /home/cal/logs/jellyfin-gpu-monitor.log
# Verify cron job is running
crontab -l | grep jellyfin_gpu
```
**Solutions**:
1. **Webhook URL Invalid**:
- Verify webhook URL in script
- Test with curl: `curl -X POST <webhook_url>`
2. **Script Permissions**:
```bash
chmod +x /home/cal/scripts/jellyfin_gpu_monitor.py
```
3. **Cron Environment Issues**:
```bash
# Test script manually
/usr/bin/python3 /home/cal/scripts/jellyfin_gpu_monitor.py --check --discord-alerts
```
## Emergency Recovery Procedures
### Complete System Recovery
#### Jellyfin Won't Start (All Else Failed)
1. **Stop Container**:
```bash
docker stop jellyfin
docker rm jellyfin
```
2. **Backup Configuration**:
```bash
cp -r ~/docker/jellyfin/config ~/docker/jellyfin/config.backup.$(date +%Y%m%d)
```
3. **Pull Fresh Image**:
```bash
docker pull jellyfin/jellyfin:latest
```
4. **Recreate Container**:
```bash
cd ~/docker/jellyfin
docker compose up -d
```
5. **Restore Settings** (if needed):
- Copy specific config files from backup
- Don't restore corrupt database
#### GPU Completely Broken
1. **Verify Host GPU**:
```bash
# If nvidia-smi fails with driver mismatch
sudo reboot
```
2. **Remove GPU Access** (temporary workaround):
```yaml
# Comment out GPU sections in docker-compose.yml
# CPU transcoding only until GPU fixed
```
3. **Reinstall NVIDIA Drivers** (if reboot doesn't help):
```bash
# Unhold packages
sudo apt-mark unhold nvidia-driver-570
# Reinstall
sudo apt remove --purge nvidia-*
sudo apt install nvidia-driver-570
sudo reboot
# Re-hold after working
sudo apt-mark hold nvidia-driver-570
```
## Preventive Maintenance
### Regular Checks (Weekly)
```bash
# Check GPU health
nvidia-smi
# Verify Jellyfin accessible
curl -I http://10.10.0.226:8096
# Check disk space (cache can grow large)
df -h /mnt/NV2
df -h ~/docker/jellyfin/config
# Review logs for errors
docker logs jellyfin --since 7d | grep -i error
```
### Monthly Tasks
```bash
# Update Jellyfin
cd ~/docker/jellyfin
docker compose pull
docker compose up -d
# Clean old transcodes
find /mnt/NV2/jellyfin-cache/transcodes/ -type f -mtime +7 -delete
# Backup configuration
tar -czf ~/jellyfin-config-backup-$(date +%Y%m%d).tar.gz ~/docker/jellyfin/config/
```
### Before Major Changes
- Create snapshot if on Proxmox
- Backup full config directory
- Test on non-production instance if possible
- Document current working configuration
## Related Documentation
- **Setup Guide**: `/media-servers/jellyfin-ubuntu-manticore.md`
- **NVIDIA Driver Management**: See jellyfin-ubuntu-manticore.md
- **GPU Monitoring**: `/monitoring/scripts/CONTEXT.md`
- **Technology Overview**: `/media-servers/CONTEXT.md`
- **Main Instructions**: `/CLAUDE.md`
## Support Resources
- **Jellyfin Docs**: https://jellyfin.org/docs/
- **NVIDIA Container Toolkit**: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/
- **Discord Monitoring**: See `/monitoring/scripts/jellyfin_gpu_monitor.py`