Add NVIDIA driver management and media server troubleshooting
Document NVIDIA driver hold/update workflow, GPU health monitoring, and update checker integration for Jellyfin on ubuntu-manticore. Add media-servers troubleshooting guide. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
6c8d199359
commit
0d552a839e
@ -129,6 +129,86 @@ For syncing watch history between Plex and Jellyfin:
|
|||||||
- Syncs via API, not NFO files
|
- Syncs via API, not NFO files
|
||||||
- NFO files don't store watch state
|
- NFO files don't store watch state
|
||||||
|
|
||||||
|
## NVIDIA Driver Management
|
||||||
|
|
||||||
|
### Auto-Update Prevention
|
||||||
|
|
||||||
|
**Issue**: NVIDIA driver auto-updates can cause driver/library version mismatches, breaking GPU access until the host is rebooted. This causes Jellyfin downtime.
|
||||||
|
|
||||||
|
**Solution**: Driver packages are held to prevent automatic updates:
|
||||||
|
```bash
|
||||||
|
# Packages currently held (as of 2026-02-05):
|
||||||
|
nvidia-driver-570
|
||||||
|
nvidia-kernel-common-570
|
||||||
|
nvidia-dkms-570
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify held packages:**
|
||||||
|
```bash
|
||||||
|
apt-mark showhold
|
||||||
|
```
|
||||||
|
|
||||||
|
### Update Monitoring
|
||||||
|
|
||||||
|
A monitoring script checks for NVIDIA driver updates weekly and sends Discord alerts when new versions are available:
|
||||||
|
|
||||||
|
**Script**: `/home/cal/scripts/nvidia_update_checker.py`
|
||||||
|
**Schedule**: Every Monday at 9 AM
|
||||||
|
**Logs**: `/home/cal/logs/nvidia-update-checker.log`
|
||||||
|
|
||||||
|
**Manual check:**
|
||||||
|
```bash
|
||||||
|
python3 /home/cal/scripts/nvidia_update_checker.py --check --discord-alerts
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test Discord integration:**
|
||||||
|
```bash
|
||||||
|
python3 /home/cal/scripts/nvidia_update_checker.py --discord-test
|
||||||
|
```
|
||||||
|
|
||||||
|
### Planned Driver Updates
|
||||||
|
|
||||||
|
When Discord alerts about available updates, plan a maintenance window:
|
||||||
|
|
||||||
|
1. **Unhold packages:**
|
||||||
|
```bash
|
||||||
|
sudo apt-mark unhold nvidia-driver-570
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Update drivers:**
|
||||||
|
```bash
|
||||||
|
sudo apt update && sudo apt upgrade nvidia-driver-570
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Reboot immediately** (driver changes require reboot):
|
||||||
|
```bash
|
||||||
|
sudo reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Verify after reboot:**
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
docker exec jellyfin nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Re-hold packages:**
|
||||||
|
```bash
|
||||||
|
sudo apt-mark hold nvidia-driver-570 nvidia-kernel-common-570 nvidia-dkms-570
|
||||||
|
```
|
||||||
|
|
||||||
|
### GPU Health Monitoring
|
||||||
|
|
||||||
|
Jellyfin GPU access is monitored every 5 minutes:
|
||||||
|
|
||||||
|
**Script**: `/home/cal/scripts/jellyfin_gpu_monitor.py`
|
||||||
|
**Features**:
|
||||||
|
- Detects GPU access loss
|
||||||
|
- Sends Discord alerts
|
||||||
|
- Auto-restarts container (if GPU accessible)
|
||||||
|
- Logs to `/home/cal/logs/jellyfin-gpu-monitor.log`
|
||||||
|
|
||||||
|
**Note**: Container restart cannot fix host-level driver issues. If Discord alerts show "Restart failed" with driver/library mismatch, a host reboot is required.
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### GPU Not Detected in Transcoding
|
### GPU Not Detected in Transcoding
|
||||||
@ -149,6 +229,22 @@ Check Jellyfin logs in Dashboard → Logs or:
|
|||||||
docker logs jellyfin 2>&1 | tail -50
|
docker logs jellyfin 2>&1 | tail -50
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Driver/Library Version Mismatch
|
||||||
|
|
||||||
|
**Symptoms**:
|
||||||
|
- `nvidia-smi` fails with "driver/library version mismatch"
|
||||||
|
- Jellyfin container won't start with NVML error
|
||||||
|
- GPU monitoring alerts show "Restart failed"
|
||||||
|
|
||||||
|
**Cause**: NVIDIA driver updated but kernel modules not reloaded
|
||||||
|
|
||||||
|
**Solution**: Reboot the host
|
||||||
|
```bash
|
||||||
|
sudo reboot
|
||||||
|
```
|
||||||
|
|
||||||
## Related Documentation
|
## Related Documentation
|
||||||
- Server inventory: `networking/server-inventory.md`
|
- Server inventory: `networking/server-inventory.md`
|
||||||
- Tdarr setup: `tdarr/ubuntu-manticore-setup.md`
|
- Tdarr setup: `tdarr/ubuntu-manticore-setup.md`
|
||||||
|
- GPU monitoring: `monitoring/scripts/jellyfin_gpu_monitor.py`
|
||||||
|
- Update monitoring: `monitoring/scripts/nvidia_update_checker.py`
|
||||||
|
|||||||
524
media-servers/troubleshooting.md
Normal file
524
media-servers/troubleshooting.md
Normal file
@ -0,0 +1,524 @@
|
|||||||
|
# Media Servers - Troubleshooting Guide
|
||||||
|
|
||||||
|
## Common Issues and Solutions
|
||||||
|
|
||||||
|
### GPU Transcoding Problems
|
||||||
|
|
||||||
|
#### GPU Not Detected in Container
|
||||||
|
**Symptoms**:
|
||||||
|
- Jellyfin shows "No hardware acceleration available"
|
||||||
|
- Transcoding falls back to CPU (slow performance)
|
||||||
|
- Container logs show NVIDIA device not found
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check GPU accessibility from container
|
||||||
|
docker exec jellyfin nvidia-smi
|
||||||
|
|
||||||
|
# Verify NVIDIA runtime is configured
|
||||||
|
docker info | grep -i nvidia
|
||||||
|
|
||||||
|
# Check container GPU configuration
|
||||||
|
docker inspect jellyfin | grep -i gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Verify NVIDIA Container Runtime**:
|
||||||
|
```bash
|
||||||
|
# On host
|
||||||
|
nvidia-smi # Should work
|
||||||
|
|
||||||
|
# Install nvidia-container-toolkit if missing
|
||||||
|
sudo apt install nvidia-container-toolkit
|
||||||
|
sudo systemctl restart docker
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Fix Docker Compose Configuration**:
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
jellyfin:
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
count: all
|
||||||
|
capabilities: [gpu]
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Restart Container**:
|
||||||
|
```bash
|
||||||
|
docker compose down
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Driver/Library Version Mismatch
|
||||||
|
**Symptoms**:
|
||||||
|
- `nvidia-smi` fails with "driver/library version mismatch"
|
||||||
|
- Container won't start with NVML error
|
||||||
|
- GPU monitoring shows "Restart failed"
|
||||||
|
|
||||||
|
**Cause**: NVIDIA driver updated on host but kernel modules not reloaded
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Check host GPU status
|
||||||
|
nvidia-smi # Will fail with mismatch error
|
||||||
|
|
||||||
|
# Reboot required to reload kernel modules
|
||||||
|
sudo reboot
|
||||||
|
|
||||||
|
# After reboot, verify
|
||||||
|
nvidia-smi
|
||||||
|
docker exec jellyfin nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prevention**:
|
||||||
|
- See `/media-servers/jellyfin-ubuntu-manticore.md` NVIDIA Driver Management section
|
||||||
|
- Hold driver packages to prevent auto-updates
|
||||||
|
- Monitor for updates weekly via automated checks
|
||||||
|
|
||||||
|
#### Transcoding Starts Then Fails
|
||||||
|
**Symptoms**:
|
||||||
|
- Playback begins then stops
|
||||||
|
- Jellyfin logs show ffmpeg errors
|
||||||
|
- GPU memory errors in logs
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check GPU memory usage
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
|
# Check for concurrent GPU users (Tdarr, other containers)
|
||||||
|
docker ps | grep -E "tdarr|jellyfin"
|
||||||
|
|
||||||
|
# Check Jellyfin transcode logs
|
||||||
|
docker logs jellyfin 2>&1 | grep -i transcode | tail -50
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **GPU Resource Conflict**: If Tdarr is using GPU, pause transcoding or limit concurrent jobs
|
||||||
|
2. **Insufficient GPU Memory**:
|
||||||
|
```bash
|
||||||
|
# Check GPU memory
|
||||||
|
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
|
||||||
|
|
||||||
|
# Reduce Jellyfin transcode resolution or bitrate
|
||||||
|
```
|
||||||
|
3. **Codec Not Supported**: Verify codec is supported by GPU encoder
|
||||||
|
```bash
|
||||||
|
# Check available encoders
|
||||||
|
docker exec jellyfin ffmpeg -encoders 2>/dev/null | grep nvenc
|
||||||
|
```
|
||||||
|
|
||||||
|
### Container Startup Issues
|
||||||
|
|
||||||
|
#### Container Won't Start After Update
|
||||||
|
**Symptoms**:
|
||||||
|
- Container exits immediately after `docker compose up -d`
|
||||||
|
- Exit code indicates error (non-zero)
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check container logs
|
||||||
|
docker logs jellyfin
|
||||||
|
|
||||||
|
# Check exit code
|
||||||
|
docker inspect jellyfin | grep ExitCode
|
||||||
|
|
||||||
|
# Try starting in foreground for detailed output
|
||||||
|
docker compose up
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common Causes & Solutions**:
|
||||||
|
|
||||||
|
1. **Permission Issues**:
|
||||||
|
```bash
|
||||||
|
# Fix ownership of config/cache directories
|
||||||
|
sudo chown -R 1000:1000 ~/docker/jellyfin/config
|
||||||
|
sudo chown -R 1000:1000 /mnt/NV2/jellyfin-cache
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Port Already in Use**:
|
||||||
|
```bash
|
||||||
|
# Check if port 8096 is in use
|
||||||
|
sudo lsof -i :8096
|
||||||
|
|
||||||
|
# Kill conflicting process or change Jellyfin port
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Volume Mount Failures**:
|
||||||
|
```bash
|
||||||
|
# Verify all mount points exist and are accessible
|
||||||
|
ls -la ~/docker/jellyfin/config
|
||||||
|
ls -la /mnt/NV2/jellyfin-cache
|
||||||
|
mount | grep /mnt/truenas/media
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Container Stuck in "Restarting" Loop
|
||||||
|
**Symptoms**:
|
||||||
|
- Docker shows container constantly restarting
|
||||||
|
- Brief uptime then crash
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Watch restart behavior
|
||||||
|
docker stats jellyfin
|
||||||
|
|
||||||
|
# Check logs for crash reason
|
||||||
|
docker logs jellyfin --tail 200
|
||||||
|
|
||||||
|
# Check resource limits
|
||||||
|
docker inspect jellyfin | grep -A 10 Resources
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Database Corruption**:
|
||||||
|
```bash
|
||||||
|
# Stop container
|
||||||
|
docker stop jellyfin
|
||||||
|
|
||||||
|
# Backup database
|
||||||
|
cp ~/docker/jellyfin/config/data/library.db{,.bak}
|
||||||
|
|
||||||
|
# Try recovery
|
||||||
|
sqlite3 ~/docker/jellyfin/config/data/library.db "PRAGMA integrity_check;"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Configuration File Issue**:
|
||||||
|
```bash
|
||||||
|
# Rename config to force regeneration
|
||||||
|
mv ~/docker/jellyfin/config/system.xml{,.bak}
|
||||||
|
|
||||||
|
# Restart container
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Network & Connectivity
|
||||||
|
|
||||||
|
#### Can't Access Web Interface
|
||||||
|
**Symptoms**:
|
||||||
|
- http://10.10.0.226:8096 not responding
|
||||||
|
- Connection timeout or refused
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check if container is running
|
||||||
|
docker ps | grep jellyfin
|
||||||
|
|
||||||
|
# Check port binding
|
||||||
|
docker port jellyfin
|
||||||
|
|
||||||
|
# Test local connectivity
|
||||||
|
curl -I http://localhost:8096
|
||||||
|
curl -I http://10.10.0.226:8096
|
||||||
|
|
||||||
|
# Check firewall
|
||||||
|
sudo ufw status | grep 8096
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Container Not Running**: Start container
|
||||||
|
```bash
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Port Not Bound Correctly**:
|
||||||
|
```yaml
|
||||||
|
# Fix docker-compose.yml
|
||||||
|
ports:
|
||||||
|
- "8096:8096" # Not "0.0.0.0:8096:8096" on some systems
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Firewall Blocking**:
|
||||||
|
```bash
|
||||||
|
sudo ufw allow 8096/tcp
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Client Discovery Not Working
|
||||||
|
**Symptoms**:
|
||||||
|
- Jellyfin apps can't auto-discover server
|
||||||
|
- Must manually enter IP address
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check UDP discovery port
|
||||||
|
docker port jellyfin | grep 7359
|
||||||
|
|
||||||
|
# Verify UDP traffic allowed
|
||||||
|
sudo ufw status | grep 7359
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solution**:
|
||||||
|
```bash
|
||||||
|
# Ensure UDP port exposed
|
||||||
|
# In docker-compose.yml:
|
||||||
|
ports:
|
||||||
|
- "7359:7359/udp"
|
||||||
|
|
||||||
|
# Allow in firewall
|
||||||
|
sudo ufw allow 7359/udp
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
#### Slow Transcoding Performance
|
||||||
|
**Symptoms**:
|
||||||
|
- Buffering during playback
|
||||||
|
- High CPU usage despite GPU available
|
||||||
|
- Transcoding slower than real-time
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check if GPU transcoding is actually being used
|
||||||
|
nvidia-smi dmon -s u -c 5 # Monitor GPU usage
|
||||||
|
|
||||||
|
# Check Jellyfin Dashboard > Playback for active transcodes
|
||||||
|
|
||||||
|
# Verify hardware accel is enabled in Jellyfin settings
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Hardware Acceleration Not Enabled**:
|
||||||
|
- Dashboard → Playback → Transcoding
|
||||||
|
- Select "NVIDIA NVENC"
|
||||||
|
- Enable desired codecs
|
||||||
|
|
||||||
|
2. **GPU Busy with Other Tasks**:
|
||||||
|
```bash
|
||||||
|
# Check what else is using GPU
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
|
# Pause Tdarr if running
|
||||||
|
docker stop tdarr-node-gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Cache on Slow Storage**:
|
||||||
|
```bash
|
||||||
|
# Verify cache is on NVMe, not network storage
|
||||||
|
docker inspect jellyfin | grep -A 5 cache
|
||||||
|
|
||||||
|
# Should be /mnt/NV2/jellyfin-cache (NVMe)
|
||||||
|
# NOT /mnt/truenas/... (network)
|
||||||
|
```
|
||||||
|
|
||||||
|
#### High Memory Usage
|
||||||
|
**Symptoms**:
|
||||||
|
- Jellyfin using excessive RAM
|
||||||
|
- Server becomes unresponsive
|
||||||
|
- OOM (Out of Memory) errors
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check memory usage
|
||||||
|
docker stats jellyfin
|
||||||
|
|
||||||
|
# Check for memory leaks in logs
|
||||||
|
docker logs jellyfin | grep -i memory
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Set Memory Limits**:
|
||||||
|
```yaml
|
||||||
|
# In docker-compose.yml
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: 4G
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Reduce Transcode Throttle**:
|
||||||
|
- Dashboard → Playback
|
||||||
|
- Lower "Throttle Transcodes" value
|
||||||
|
|
||||||
|
3. **Clear Transcode Cache**:
|
||||||
|
```bash
|
||||||
|
# Stop container
|
||||||
|
docker stop jellyfin
|
||||||
|
|
||||||
|
# Clear transcode cache
|
||||||
|
rm -rf /mnt/NV2/jellyfin-cache/transcodes/*
|
||||||
|
|
||||||
|
# Start container
|
||||||
|
docker start jellyfin
|
||||||
|
```
|
||||||
|
|
||||||
|
### Playback Problems
|
||||||
|
|
||||||
|
#### Playback Stuttering Despite Good Network
|
||||||
|
**Symptoms**:
|
||||||
|
- Video plays but stutters/buffers frequently
|
||||||
|
- Network speed is adequate
|
||||||
|
- Direct play works, transcoding stutters
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Check Transcode Quality Settings**:
|
||||||
|
- Lower bitrate in client settings
|
||||||
|
- Reduce resolution if needed
|
||||||
|
|
||||||
|
2. **Verify GPU Transcoding Active**:
|
||||||
|
```bash
|
||||||
|
# While playing, check GPU usage
|
||||||
|
nvidia-smi dmon -s u
|
||||||
|
# Should show encoder (enc) usage
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Check Storage I/O**:
|
||||||
|
```bash
|
||||||
|
# Monitor disk I/O during playback
|
||||||
|
iostat -x 2 5
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Audio/Video Sync Issues
|
||||||
|
**Symptoms**:
|
||||||
|
- Audio and video out of sync during playback
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Enable Audio Passthrough** (if supported by client)
|
||||||
|
2. **Update ffmpeg** in container (usually handled by Jellyfin updates)
|
||||||
|
3. **Try Different Transcode Settings**:
|
||||||
|
- Disable subtitle burn-in if not needed
|
||||||
|
- Change audio codec settings
|
||||||
|
|
||||||
|
### Monitoring & Alerts
|
||||||
|
|
||||||
|
#### GPU Monitor Alerts Not Working
|
||||||
|
**Symptoms**:
|
||||||
|
- No Discord notifications when GPU issues occur
|
||||||
|
- Monitoring script seems to run but no alerts
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Test Discord webhook
|
||||||
|
python3 /home/cal/scripts/jellyfin_gpu_monitor.py --discord-test
|
||||||
|
|
||||||
|
# Check monitoring logs
|
||||||
|
tail -f /home/cal/logs/jellyfin-gpu-monitor.log
|
||||||
|
|
||||||
|
# Verify cron job is running
|
||||||
|
crontab -l | grep jellyfin_gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
1. **Webhook URL Invalid**:
|
||||||
|
- Verify webhook URL in script
|
||||||
|
- Test with curl: `curl -X POST <webhook_url>`
|
||||||
|
|
||||||
|
2. **Script Permissions**:
|
||||||
|
```bash
|
||||||
|
chmod +x /home/cal/scripts/jellyfin_gpu_monitor.py
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Cron Environment Issues**:
|
||||||
|
```bash
|
||||||
|
# Test script manually
|
||||||
|
/usr/bin/python3 /home/cal/scripts/jellyfin_gpu_monitor.py --check --discord-alerts
|
||||||
|
```
|
||||||
|
|
||||||
|
## Emergency Recovery Procedures
|
||||||
|
|
||||||
|
### Complete System Recovery
|
||||||
|
|
||||||
|
#### Jellyfin Won't Start (All Else Failed)
|
||||||
|
1. **Stop Container**:
|
||||||
|
```bash
|
||||||
|
docker stop jellyfin
|
||||||
|
docker rm jellyfin
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Backup Configuration**:
|
||||||
|
```bash
|
||||||
|
cp -r ~/docker/jellyfin/config ~/docker/jellyfin/config.backup.$(date +%Y%m%d)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Pull Fresh Image**:
|
||||||
|
```bash
|
||||||
|
docker pull jellyfin/jellyfin:latest
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Recreate Container**:
|
||||||
|
```bash
|
||||||
|
cd ~/docker/jellyfin
|
||||||
|
docker compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Restore Settings** (if needed):
|
||||||
|
- Copy specific config files from backup
|
||||||
|
- Don't restore corrupt database
|
||||||
|
|
||||||
|
#### GPU Completely Broken
|
||||||
|
1. **Verify Host GPU**:
|
||||||
|
```bash
|
||||||
|
# If nvidia-smi fails with driver mismatch
|
||||||
|
sudo reboot
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Remove GPU Access** (temporary workaround):
|
||||||
|
```yaml
|
||||||
|
# Comment out GPU sections in docker-compose.yml
|
||||||
|
# CPU transcoding only until GPU fixed
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Reinstall NVIDIA Drivers** (if reboot doesn't help):
|
||||||
|
```bash
|
||||||
|
# Unhold packages
|
||||||
|
sudo apt-mark unhold nvidia-driver-570
|
||||||
|
|
||||||
|
# Reinstall
|
||||||
|
sudo apt remove --purge nvidia-*
|
||||||
|
sudo apt install nvidia-driver-570
|
||||||
|
sudo reboot
|
||||||
|
|
||||||
|
# Re-hold after working
|
||||||
|
sudo apt-mark hold nvidia-driver-570
|
||||||
|
```
|
||||||
|
|
||||||
|
## Preventive Maintenance
|
||||||
|
|
||||||
|
### Regular Checks (Weekly)
|
||||||
|
```bash
|
||||||
|
# Check GPU health
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
|
# Verify Jellyfin accessible
|
||||||
|
curl -I http://10.10.0.226:8096
|
||||||
|
|
||||||
|
# Check disk space (cache can grow large)
|
||||||
|
df -h /mnt/NV2
|
||||||
|
df -h ~/docker/jellyfin/config
|
||||||
|
|
||||||
|
# Review logs for errors
|
||||||
|
docker logs jellyfin --since 7d | grep -i error
|
||||||
|
```
|
||||||
|
|
||||||
|
### Monthly Tasks
|
||||||
|
```bash
|
||||||
|
# Update Jellyfin
|
||||||
|
cd ~/docker/jellyfin
|
||||||
|
docker compose pull
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# Clean old transcodes
|
||||||
|
find /mnt/NV2/jellyfin-cache/transcodes/ -type f -mtime +7 -delete
|
||||||
|
|
||||||
|
# Backup configuration
|
||||||
|
tar -czf ~/jellyfin-config-backup-$(date +%Y%m%d).tar.gz ~/docker/jellyfin/config/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Before Major Changes
|
||||||
|
- Create snapshot if on Proxmox
|
||||||
|
- Backup full config directory
|
||||||
|
- Test on non-production instance if possible
|
||||||
|
- Document current working configuration
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
- **Setup Guide**: `/media-servers/jellyfin-ubuntu-manticore.md`
|
||||||
|
- **NVIDIA Driver Management**: See jellyfin-ubuntu-manticore.md
|
||||||
|
- **GPU Monitoring**: `/monitoring/scripts/CONTEXT.md`
|
||||||
|
- **Technology Overview**: `/media-servers/CONTEXT.md`
|
||||||
|
- **Main Instructions**: `/CLAUDE.md`
|
||||||
|
|
||||||
|
## Support Resources
|
||||||
|
- **Jellyfin Docs**: https://jellyfin.org/docs/
|
||||||
|
- **NVIDIA Container Toolkit**: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/
|
||||||
|
- **Discord Monitoring**: See `/monitoring/scripts/jellyfin_gpu_monitor.py`
|
||||||
Loading…
Reference in New Issue
Block a user