claude-home/media-servers/troubleshooting.md
Cal Corum 0d552a839e Add NVIDIA driver management and media server troubleshooting
Document NVIDIA driver hold/update workflow, GPU health monitoring,
and update checker integration for Jellyfin on ubuntu-manticore.
Add media-servers troubleshooting guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 22:20:55 -06:00

11 KiB

Media Servers - Troubleshooting Guide

Common Issues and Solutions

GPU Transcoding Problems

GPU Not Detected in Container

Symptoms:

  • Jellyfin shows "No hardware acceleration available"
  • Transcoding falls back to CPU (slow performance)
  • Container logs show NVIDIA device not found

Diagnosis:

# Check GPU accessibility from container
docker exec jellyfin nvidia-smi

# Verify NVIDIA runtime is configured
docker info | grep -i nvidia

# Check container GPU configuration
docker inspect jellyfin | grep -i gpu

Solutions:

  1. Verify NVIDIA Container Runtime:

    # On host
    nvidia-smi  # Should work
    
    # Install nvidia-container-toolkit if missing
    sudo apt install nvidia-container-toolkit
    sudo systemctl restart docker
    
  2. Fix Docker Compose Configuration:

    services:
      jellyfin:
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: all
                  capabilities: [gpu]
    
  3. Restart Container:

    docker compose down
    docker compose up -d
    

Driver/Library Version Mismatch

Symptoms:

  • nvidia-smi fails with "driver/library version mismatch"
  • Container won't start with NVML error
  • GPU monitoring shows "Restart failed"

Cause: NVIDIA driver updated on host but kernel modules not reloaded

Solution:

# Check host GPU status
nvidia-smi  # Will fail with mismatch error

# Reboot required to reload kernel modules
sudo reboot

# After reboot, verify
nvidia-smi
docker exec jellyfin nvidia-smi

Prevention:

  • See /media-servers/jellyfin-ubuntu-manticore.md NVIDIA Driver Management section
  • Hold driver packages to prevent auto-updates
  • Monitor for updates weekly via automated checks

Transcoding Starts Then Fails

Symptoms:

  • Playback begins then stops
  • Jellyfin logs show ffmpeg errors
  • GPU memory errors in logs

Diagnosis:

# Check GPU memory usage
nvidia-smi

# Check for concurrent GPU users (Tdarr, other containers)
docker ps | grep -E "tdarr|jellyfin"

# Check Jellyfin transcode logs
docker logs jellyfin 2>&1 | grep -i transcode | tail -50

Solutions:

  1. GPU Resource Conflict: If Tdarr is using GPU, pause transcoding or limit concurrent jobs
  2. Insufficient GPU Memory:
    # Check GPU memory
    nvidia-smi --query-gpu=memory.used,memory.total --format=csv
    
    # Reduce Jellyfin transcode resolution or bitrate
    
  3. Codec Not Supported: Verify codec is supported by GPU encoder
    # Check available encoders
    docker exec jellyfin ffmpeg -encoders 2>/dev/null | grep nvenc
    

Container Startup Issues

Container Won't Start After Update

Symptoms:

  • Container exits immediately after docker compose up -d
  • Exit code indicates error (non-zero)

Diagnosis:

# Check container logs
docker logs jellyfin

# Check exit code
docker inspect jellyfin | grep ExitCode

# Try starting in foreground for detailed output
docker compose up

Common Causes & Solutions:

  1. Permission Issues:

    # Fix ownership of config/cache directories
    sudo chown -R 1000:1000 ~/docker/jellyfin/config
    sudo chown -R 1000:1000 /mnt/NV2/jellyfin-cache
    
  2. Port Already in Use:

    # Check if port 8096 is in use
    sudo lsof -i :8096
    
    # Kill conflicting process or change Jellyfin port
    
  3. Volume Mount Failures:

    # Verify all mount points exist and are accessible
    ls -la ~/docker/jellyfin/config
    ls -la /mnt/NV2/jellyfin-cache
    mount | grep /mnt/truenas/media
    

Container Stuck in "Restarting" Loop

Symptoms:

  • Docker shows container constantly restarting
  • Brief uptime then crash

Diagnosis:

# Watch restart behavior
docker stats jellyfin

# Check logs for crash reason
docker logs jellyfin --tail 200

# Check resource limits
docker inspect jellyfin | grep -A 10 Resources

Solutions:

  1. Database Corruption:

    # Stop container
    docker stop jellyfin
    
    # Backup database
    cp ~/docker/jellyfin/config/data/library.db{,.bak}
    
    # Try recovery
    sqlite3 ~/docker/jellyfin/config/data/library.db "PRAGMA integrity_check;"
    
  2. Configuration File Issue:

    # Rename config to force regeneration
    mv ~/docker/jellyfin/config/system.xml{,.bak}
    
    # Restart container
    docker compose up -d
    

Network & Connectivity

Can't Access Web Interface

Symptoms:

Diagnosis:

# Check if container is running
docker ps | grep jellyfin

# Check port binding
docker port jellyfin

# Test local connectivity
curl -I http://localhost:8096
curl -I http://10.10.0.226:8096

# Check firewall
sudo ufw status | grep 8096

Solutions:

  1. Container Not Running: Start container

    docker compose up -d
    
  2. Port Not Bound Correctly:

    # Fix docker-compose.yml
    ports:
      - "8096:8096"  # Not "0.0.0.0:8096:8096" on some systems
    
  3. Firewall Blocking:

    sudo ufw allow 8096/tcp
    

Client Discovery Not Working

Symptoms:

  • Jellyfin apps can't auto-discover server
  • Must manually enter IP address

Diagnosis:

# Check UDP discovery port
docker port jellyfin | grep 7359

# Verify UDP traffic allowed
sudo ufw status | grep 7359

Solution:

# Ensure UDP port exposed
# In docker-compose.yml:
ports:
  - "7359:7359/udp"

# Allow in firewall
sudo ufw allow 7359/udp

Performance Issues

Slow Transcoding Performance

Symptoms:

  • Buffering during playback
  • High CPU usage despite GPU available
  • Transcoding slower than real-time

Diagnosis:

# Check if GPU transcoding is actually being used
nvidia-smi dmon -s u -c 5  # Monitor GPU usage

# Check Jellyfin Dashboard > Playback for active transcodes

# Verify hardware accel is enabled in Jellyfin settings

Solutions:

  1. Hardware Acceleration Not Enabled:

    • Dashboard → Playback → Transcoding
    • Select "NVIDIA NVENC"
    • Enable desired codecs
  2. GPU Busy with Other Tasks:

    # Check what else is using GPU
    nvidia-smi
    
    # Pause Tdarr if running
    docker stop tdarr-node-gpu
    
  3. Cache on Slow Storage:

    # Verify cache is on NVMe, not network storage
    docker inspect jellyfin | grep -A 5 cache
    
    # Should be /mnt/NV2/jellyfin-cache (NVMe)
    # NOT /mnt/truenas/... (network)
    

High Memory Usage

Symptoms:

  • Jellyfin using excessive RAM
  • Server becomes unresponsive
  • OOM (Out of Memory) errors

Diagnosis:

# Check memory usage
docker stats jellyfin

# Check for memory leaks in logs
docker logs jellyfin | grep -i memory

Solutions:

  1. Set Memory Limits:

    # In docker-compose.yml
    deploy:
      resources:
        limits:
          memory: 4G
    
  2. Reduce Transcode Throttle:

    • Dashboard → Playback
    • Lower "Throttle Transcodes" value
  3. Clear Transcode Cache:

    # Stop container
    docker stop jellyfin
    
    # Clear transcode cache
    rm -rf /mnt/NV2/jellyfin-cache/transcodes/*
    
    # Start container
    docker start jellyfin
    

Playback Problems

Playback Stuttering Despite Good Network

Symptoms:

  • Video plays but stutters/buffers frequently
  • Network speed is adequate
  • Direct play works, transcoding stutters

Solutions:

  1. Check Transcode Quality Settings:

    • Lower bitrate in client settings
    • Reduce resolution if needed
  2. Verify GPU Transcoding Active:

    # While playing, check GPU usage
    nvidia-smi dmon -s u
    # Should show encoder (enc) usage
    
  3. Check Storage I/O:

    # Monitor disk I/O during playback
    iostat -x 2 5
    

Audio/Video Sync Issues

Symptoms:

  • Audio and video out of sync during playback

Solutions:

  1. Enable Audio Passthrough (if supported by client)
  2. Update ffmpeg in container (usually handled by Jellyfin updates)
  3. Try Different Transcode Settings:
    • Disable subtitle burn-in if not needed
    • Change audio codec settings

Monitoring & Alerts

GPU Monitor Alerts Not Working

Symptoms:

  • No Discord notifications when GPU issues occur
  • Monitoring script seems to run but no alerts

Diagnosis:

# Test Discord webhook
python3 /home/cal/scripts/jellyfin_gpu_monitor.py --discord-test

# Check monitoring logs
tail -f /home/cal/logs/jellyfin-gpu-monitor.log

# Verify cron job is running
crontab -l | grep jellyfin_gpu

Solutions:

  1. Webhook URL Invalid:

    • Verify webhook URL in script
    • Test with curl: curl -X POST <webhook_url>
  2. Script Permissions:

    chmod +x /home/cal/scripts/jellyfin_gpu_monitor.py
    
  3. Cron Environment Issues:

    # Test script manually
    /usr/bin/python3 /home/cal/scripts/jellyfin_gpu_monitor.py --check --discord-alerts
    

Emergency Recovery Procedures

Complete System Recovery

Jellyfin Won't Start (All Else Failed)

  1. Stop Container:

    docker stop jellyfin
    docker rm jellyfin
    
  2. Backup Configuration:

    cp -r ~/docker/jellyfin/config ~/docker/jellyfin/config.backup.$(date +%Y%m%d)
    
  3. Pull Fresh Image:

    docker pull jellyfin/jellyfin:latest
    
  4. Recreate Container:

    cd ~/docker/jellyfin
    docker compose up -d
    
  5. Restore Settings (if needed):

    • Copy specific config files from backup
    • Don't restore corrupt database

GPU Completely Broken

  1. Verify Host GPU:

    # If nvidia-smi fails with driver mismatch
    sudo reboot
    
  2. Remove GPU Access (temporary workaround):

    # Comment out GPU sections in docker-compose.yml
    # CPU transcoding only until GPU fixed
    
  3. Reinstall NVIDIA Drivers (if reboot doesn't help):

    # Unhold packages
    sudo apt-mark unhold nvidia-driver-570
    
    # Reinstall
    sudo apt remove --purge nvidia-*
    sudo apt install nvidia-driver-570
    sudo reboot
    
    # Re-hold after working
    sudo apt-mark hold nvidia-driver-570
    

Preventive Maintenance

Regular Checks (Weekly)

# Check GPU health
nvidia-smi

# Verify Jellyfin accessible
curl -I http://10.10.0.226:8096

# Check disk space (cache can grow large)
df -h /mnt/NV2
df -h ~/docker/jellyfin/config

# Review logs for errors
docker logs jellyfin --since 7d | grep -i error

Monthly Tasks

# Update Jellyfin
cd ~/docker/jellyfin
docker compose pull
docker compose up -d

# Clean old transcodes
find /mnt/NV2/jellyfin-cache/transcodes/ -type f -mtime +7 -delete

# Backup configuration
tar -czf ~/jellyfin-config-backup-$(date +%Y%m%d).tar.gz ~/docker/jellyfin/config/

Before Major Changes

  • Create snapshot if on Proxmox
  • Backup full config directory
  • Test on non-production instance if possible
  • Document current working configuration
  • Setup Guide: /media-servers/jellyfin-ubuntu-manticore.md
  • NVIDIA Driver Management: See jellyfin-ubuntu-manticore.md
  • GPU Monitoring: /monitoring/scripts/CONTEXT.md
  • Technology Overview: /media-servers/CONTEXT.md
  • Main Instructions: /CLAUDE.md

Support Resources