claude-home/media-servers/troubleshooting.md
2026-04-02 20:48:39 -05:00

16 KiB

title description type domain tags
Media Servers Troubleshooting Troubleshooting guide for Jellyfin media server issues including GPU transcoding failures, driver mismatches, container startup problems, network connectivity, Roku/Apple TV playback, and emergency recovery. troubleshooting media-servers
jellyfin
nvidia
gpu
transcoding
docker
roku
troubleshooting
recovery

Media Servers - Troubleshooting Guide

Common Issues and Solutions

GPU Transcoding Problems

GPU Not Detected in Container

Symptoms:

  • Jellyfin shows "No hardware acceleration available"
  • Transcoding falls back to CPU (slow performance)
  • Container logs show NVIDIA device not found

Diagnosis:

# Check GPU accessibility from container
docker exec jellyfin nvidia-smi

# Verify NVIDIA runtime is configured
docker info | grep -i nvidia

# Check container GPU configuration
docker inspect jellyfin | grep -i gpu

Solutions:

  1. Verify NVIDIA Container Runtime:

    # On host
    nvidia-smi  # Should work
    
    # Install nvidia-container-toolkit if missing
    sudo apt install nvidia-container-toolkit
    sudo systemctl restart docker
    
  2. Fix Docker Compose Configuration:

    services:
      jellyfin:
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: all
                  capabilities: [gpu]
    
  3. Restart Container:

    docker compose down
    docker compose up -d
    

Driver/Library Version Mismatch

Symptoms:

  • nvidia-smi fails with "driver/library version mismatch"
  • Container won't start with NVML error
  • GPU monitoring shows "Restart failed"

Cause: NVIDIA driver updated on host but kernel modules not reloaded

Solution:

# Check host GPU status
nvidia-smi  # Will fail with mismatch error

# Reboot required to reload kernel modules
sudo reboot

# After reboot, verify
nvidia-smi
docker exec jellyfin nvidia-smi

Prevention:

  • See /media-servers/jellyfin-ubuntu-manticore.md NVIDIA Driver Management section
  • Hold driver packages to prevent auto-updates
  • Monitor for updates weekly via automated checks

Transcoding Starts Then Fails

Symptoms:

  • Playback begins then stops
  • Jellyfin logs show ffmpeg errors
  • GPU memory errors in logs

Diagnosis:

# Check GPU memory usage
nvidia-smi

# Check for concurrent GPU users (Tdarr, other containers)
docker ps | grep -E "tdarr|jellyfin"

# Check Jellyfin transcode logs
docker logs jellyfin 2>&1 | grep -i transcode | tail -50

Solutions:

  1. GPU Resource Conflict: If Tdarr is using GPU, pause transcoding or limit concurrent jobs
  2. Insufficient GPU Memory:
    # Check GPU memory
    nvidia-smi --query-gpu=memory.used,memory.total --format=csv
    
    # Reduce Jellyfin transcode resolution or bitrate
    
  3. Codec Not Supported: Verify codec is supported by GPU encoder
    # Check available encoders
    docker exec jellyfin ffmpeg -encoders 2>/dev/null | grep nvenc
    

Container Startup Issues

Container Won't Start After Update

Symptoms:

  • Container exits immediately after docker compose up -d
  • Exit code indicates error (non-zero)

Diagnosis:

# Check container logs
docker logs jellyfin

# Check exit code
docker inspect jellyfin | grep ExitCode

# Try starting in foreground for detailed output
docker compose up

Common Causes & Solutions:

  1. Permission Issues:

    # Fix ownership of config/cache directories
    sudo chown -R 1000:1000 ~/docker/jellyfin/config
    sudo chown -R 1000:1000 /mnt/NV2/jellyfin-cache
    
  2. Port Already in Use:

    # Check if port 8096 is in use
    sudo lsof -i :8096
    
    # Kill conflicting process or change Jellyfin port
    
  3. Volume Mount Failures:

    # Verify all mount points exist and are accessible
    ls -la ~/docker/jellyfin/config
    ls -la /mnt/NV2/jellyfin-cache
    mount | grep /mnt/truenas/media
    

Container Stuck in "Restarting" Loop

Symptoms:

  • Docker shows container constantly restarting
  • Brief uptime then crash

Diagnosis:

# Watch restart behavior
docker stats jellyfin

# Check logs for crash reason
docker logs jellyfin --tail 200

# Check resource limits
docker inspect jellyfin | grep -A 10 Resources

Solutions:

  1. Database Corruption:

    # Stop container
    docker stop jellyfin
    
    # Backup database
    cp ~/docker/jellyfin/config/data/library.db{,.bak}
    
    # Try recovery
    sqlite3 ~/docker/jellyfin/config/data/library.db "PRAGMA integrity_check;"
    
  2. Configuration File Issue:

    # Rename config to force regeneration
    mv ~/docker/jellyfin/config/system.xml{,.bak}
    
    # Restart container
    docker compose up -d
    

Network & Connectivity

Can't Access Web Interface

Symptoms:

Diagnosis:

# Check if container is running
docker ps | grep jellyfin

# Check port binding
docker port jellyfin

# Test local connectivity
curl -I http://localhost:8096
curl -I http://10.10.0.226:8096

# Check firewall
sudo ufw status | grep 8096

Solutions:

  1. Container Not Running: Start container

    docker compose up -d
    
  2. Port Not Bound Correctly:

    # Fix docker-compose.yml
    ports:
      - "8096:8096"  # Not "0.0.0.0:8096:8096" on some systems
    
  3. Firewall Blocking:

    sudo ufw allow 8096/tcp
    

Client Discovery Not Working

Symptoms:

  • Jellyfin apps can't auto-discover server
  • Must manually enter IP address

Diagnosis:

# Check UDP discovery port
docker port jellyfin | grep 7359

# Verify UDP traffic allowed
sudo ufw status | grep 7359

Solution:

# Ensure UDP port exposed
# In docker-compose.yml:
ports:
  - "7359:7359/udp"

# Allow in firewall
sudo ufw allow 7359/udp

Performance Issues

Slow Transcoding Performance

Symptoms:

  • Buffering during playback
  • High CPU usage despite GPU available
  • Transcoding slower than real-time

Diagnosis:

# Check if GPU transcoding is actually being used
nvidia-smi dmon -s u -c 5  # Monitor GPU usage

# Check Jellyfin Dashboard > Playback for active transcodes

# Verify hardware accel is enabled in Jellyfin settings

Solutions:

  1. Hardware Acceleration Not Enabled:

    • Dashboard → Playback → Transcoding
    • Select "NVIDIA NVENC"
    • Enable desired codecs
  2. GPU Busy with Other Tasks:

    # Check what else is using GPU
    nvidia-smi
    
    # Pause Tdarr if running
    docker stop tdarr-node-gpu
    
  3. Cache on Slow Storage:

    # Verify cache is on NVMe, not network storage
    docker inspect jellyfin | grep -A 5 cache
    
    # Should be /mnt/NV2/jellyfin-cache (NVMe)
    # NOT /mnt/truenas/... (network)
    

High Memory Usage

Symptoms:

  • Jellyfin using excessive RAM
  • Server becomes unresponsive
  • OOM (Out of Memory) errors

Diagnosis:

# Check memory usage
docker stats jellyfin

# Check for memory leaks in logs
docker logs jellyfin | grep -i memory

Solutions:

  1. Set Memory Limits:

    # In docker-compose.yml
    deploy:
      resources:
        limits:
          memory: 4G
    
  2. Reduce Transcode Throttle:

    • Dashboard → Playback
    • Lower "Throttle Transcodes" value
  3. Clear Transcode Cache:

    # Stop container
    docker stop jellyfin
    
    # Clear transcode cache
    rm -rf /mnt/NV2/jellyfin-cache/transcodes/*
    
    # Start container
    docker start jellyfin
    

Playback Problems

Playback Stuttering Despite Good Network

Symptoms:

  • Video plays but stutters/buffers frequently
  • Network speed is adequate
  • Direct play works, transcoding stutters

Solutions:

  1. Check Transcode Quality Settings:

    • Lower bitrate in client settings
    • Reduce resolution if needed
  2. Verify GPU Transcoding Active:

    # While playing, check GPU usage
    nvidia-smi dmon -s u
    # Should show encoder (enc) usage
    
  3. Check Storage I/O:

    # Monitor disk I/O during playback
    iostat -x 2 5
    

Roku/Apple TV Playback Timeout (TrueHD/DTS-HD MA Audio)

Symptoms:

  • Playback hangs at "Loading" for 20-30 seconds then fails on Roku
  • Jellyfin logs show forced transcoding with subtitle extraction delay
  • Works fine on web browser or mobile clients

Root Cause: File has incompatible default audio (TrueHD, DTS-HD MA, Opus) AND a default SRT subtitle. Jellyfin must transcode audio AND burn-in subtitles over HLS. The 27-second subtitle extraction delay causes Roku client timeout.

Incompatible Audio Codecs (Roku/Apple TV):

Codec Status
AC3 (Dolby Digital) Native playback
AAC Native playback
EAC3 (Dolby Digital+) Native playback
TrueHD Requires transcode
DTS / DTS-HD MA Requires transcode
Opus Requires transcode

Immediate Fix (per-file with mkvpropedit):

# Clear subtitle default, set compatible audio as default
mkvprobedit "file.mkv" \
  --edit track:s1 --set flag-default=0 \
  --edit track:a1 --set flag-default=0 \
  --edit track:a3 --set flag-default=1

Systemic Fix: Tdarr flow plugins ensAC3str (adds AC3 stereo fallback) and clrSubDef (clears non-forced subtitle defaults) — see tdarr/CONTEXT.md

Audio/Video Sync Issues

Symptoms:

  • Audio and video out of sync during playback

Solutions:

  1. Enable Audio Passthrough (if supported by client)
  2. Update ffmpeg in container (usually handled by Jellyfin updates)
  3. Try Different Transcode Settings:
    • Disable subtitle burn-in if not needed
    • Change audio codec settings

Monitoring & Alerts

GPU Monitor Alerts Not Working

Symptoms:

  • No Discord notifications when GPU issues occur
  • Monitoring script seems to run but no alerts

Diagnosis:

# Test Discord webhook
python3 /home/cal/scripts/jellyfin_gpu_monitor.py --discord-test

# Check monitoring logs
tail -f /home/cal/logs/jellyfin-gpu-monitor.log

# Verify cron job is running
crontab -l | grep jellyfin_gpu

Solutions:

  1. Webhook URL Invalid:

    • Verify webhook URL in script
    • Test with curl: curl -X POST <webhook_url>
  2. Script Permissions:

    chmod +x /home/cal/scripts/jellyfin_gpu_monitor.py
    
  3. Cron Environment Issues:

    # Test script manually
    /usr/bin/python3 /home/cal/scripts/jellyfin_gpu_monitor.py --check --discord-alerts
    

Emergency Recovery Procedures

Complete System Recovery

Jellyfin Won't Start (All Else Failed)

  1. Stop Container:

    docker stop jellyfin
    docker rm jellyfin
    
  2. Backup Configuration:

    cp -r ~/docker/jellyfin/config ~/docker/jellyfin/config.backup.$(date +%Y%m%d)
    
  3. Pull Fresh Image:

    docker pull jellyfin/jellyfin:latest
    
  4. Recreate Container:

    cd ~/docker/jellyfin
    docker compose up -d
    
  5. Restore Settings (if needed):

    • Copy specific config files from backup
    • Don't restore corrupt database

GPU Completely Broken

  1. Verify Host GPU:

    # If nvidia-smi fails with driver mismatch
    sudo reboot
    
  2. Remove GPU Access (temporary workaround):

    # Comment out GPU sections in docker-compose.yml
    # CPU transcoding only until GPU fixed
    
  3. Reinstall NVIDIA Drivers (if reboot doesn't help):

    # Unhold packages
    sudo apt-mark unhold nvidia-driver-570
    
    # Reinstall
    sudo apt remove --purge nvidia-*
    sudo apt install nvidia-driver-570
    sudo reboot
    
    # Re-hold after working
    sudo apt-mark hold nvidia-driver-570
    

Preventive Maintenance

Regular Checks (Weekly)

# Check GPU health
nvidia-smi

# Verify Jellyfin accessible
curl -I http://10.10.0.226:8096

# Check disk space (cache can grow large)
df -h /mnt/NV2
df -h ~/docker/jellyfin/config

# Review logs for errors
docker logs jellyfin --since 7d | grep -i error

Monthly Tasks

# Update Jellyfin
cd ~/docker/jellyfin
docker compose pull
docker compose up -d

# Clean old transcodes
find /mnt/NV2/jellyfin-cache/transcodes/ -type f -mtime +7 -delete

# Backup configuration
tar -czf ~/jellyfin-config-backup-$(date +%Y%m%d).tar.gz ~/docker/jellyfin/config/

Before Major Changes

  • Create snapshot if on Proxmox
  • Backup full config directory
  • Test on non-production instance if possible
  • Document current working configuration

Roku Buffering on Weak WiFi — Client Bitrate Cap (2026-03-26)

Severity: Low — single device, non-critical viewing location

Problem: Roku in a far corner of the house with poor WiFi signal was buffering/failing to play videos. Content was not being transcoded down to accommodate the limited bandwidth.

Root Cause: Jellyfin does not dynamically adapt bitrate mid-stream (no HLS ABR like Netflix). The server's RemoteClientBitrateLimit was set to 0 (unlimited), and LAN clients are treated as "local" anyway so that setting wouldn't apply. The Roku Jellyfin app was requesting full-quality streams that exceeded the WiFi throughput.

Fix: Set Max Streaming Bitrate in the Jellyfin Roku app settings (Settings > Playback) to a lower value (4-8 Mbps). This forces the server to transcode down via NVENC before sending. No server-side changes needed.

Lesson: For bandwidth-constrained clients, the client-side bitrate setting is the first lever to pull. For a server-enforced cap that survives app resets, create a dedicated Jellyfin user for that device and set a per-user bitrate limit in Dashboard > Users > Playback. The RemoteClientBitrateLimit in system.xml only applies to clients Jellyfin considers "remote" — LAN devices are always "local."


PGS Subtitle Default Flags Causing Roku Playback Hang (2026-04-01)

Severity: Medium — affects all Roku/Apple TV clients attempting to play remuxes with PGS subtitles

Problem: Playback on Roku hangs at "Loading" and stops at 0 ms. Jellyfin logs show ffmpeg extracting all subtitle streams (including PGS) from the full-length movie before playback can begin. User Staci reported Jurassic Park (1993) taking forever to start on the living room Roku.

Root Cause: PGS (hdmv_pgs_subtitle) tracks flagged as default in MKV files cause the Roku client to auto-select them. Roku can't decode PGS natively, so Jellyfin must burn them in — triggering a full subtitle extraction pass and video transcode before any data reaches the client. 178 out of ~400 movies in the library had this flag set, mostly remuxes that predate the Tdarr clrSubDef flow plugin.

Fix:

  1. Batch fix (existing library): Wrote fix-pgs-defaults.sh — scans all MKVs with mkvmerge -J, finds PGS tracks with default_track: true, clears via mkvpropedit --edit track:N --set flag-default=0. Key gotcha: mkvpropedit uses 1-indexed track numbers (track_id + 1), NOT track:=ID (which matches by UID). Script is on manticore at /tmp/fix-pgs-defaults.sh. Fixed 178 files, no re-encoding needed.
  2. Going forward (Tdarr): The flow already has a "Clear Subtitle Default Flags" custom function plugin (clrSubDef) that clears default disposition on non-forced subtitle tracks during transcoding. New files processed by Tdarr are handled automatically.

Lesson: Remux files from automated downloaders almost always have PGS defaults set. Any bulk import of remuxes should be followed by a PGS default flag sweep. The CIFS media mount on manticore is read-only inside the Jellyfin container — mkvpropedit must run from the host against /mnt/truenas/media/Movies.

  • Setup Guide: /media-servers/jellyfin-ubuntu-manticore.md
  • NVIDIA Driver Management: See jellyfin-ubuntu-manticore.md
  • GPU Monitoring: /monitoring/scripts/CONTEXT.md
  • Technology Overview: /media-servers/CONTEXT.md
  • Main Instructions: /CLAUDE.md

Support Resources