- Created jellyfin_gpu_monitor.py for detecting lost GPU access - Sends Discord alerts when GPU access fails - Auto-restarts container to restore GPU binding - Runs every 5 minutes via cron on ubuntu-manticore - Documents FFmpeg exit code 187 (NVENC failure) in troubleshooting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.4 KiB
Media Servers - Technology Context
Overview
Media server infrastructure for home lab environments, covering streaming services like Jellyfin and Plex with hardware-accelerated transcoding, library management, and client discovery.
Current Deployments
Jellyfin on ubuntu-manticore
- Location: 10.10.0.226:8096
- GPU: NVIDIA GTX 1070 (NVENC/NVDEC)
- Documentation:
jellyfin-ubuntu-manticore.md
Plex (Existing)
- Location: TBD (potential migration to ubuntu-manticore)
- Note: Currently running elsewhere, may migrate for GPU access
Architecture Patterns
GPU-Accelerated Transcoding
Pattern: Hardware encoding/decoding for real-time streaming
# Docker Compose GPU passthrough
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_DRIVER_CAPABILITIES=all
- NVIDIA_VISIBLE_DEVICES=all
Storage Strategy
Pattern: Tiered storage for different access patterns
- Config: Local SSD (small, fast database access)
- Cache: Local NVMe (transcoding temp, thumbnails)
- Media: Network storage (large capacity, read-only mount)
Multi-Service GPU Sharing
Pattern: Resource allocation when multiple services share GPU
- Limit background tasks (Tdarr) to fewer concurrent jobs
- Prioritize real-time services (Jellyfin/Plex playback)
- Consumer GPUs limited to 2-3 concurrent NVENC sessions
Common Configurations
NVIDIA GPU Setup
# Verify GPU in container
docker exec <container> nvidia-smi
# Check encoder/decoder utilization
nvidia-smi dmon -s u
Media Volume Mounts
volumes:
- /mnt/truenas/media:/media:ro # Read-only for safety
Client Discovery
- Jellyfin: UDP 7359
- Plex: UDP 32410-32414, GDM
Integration Points
Watch History Sync
- Tool: watchstate (ghcr.io/arabcoders/watchstate)
- Method: API-based sync between services
- Note: NFO files do NOT store watch history
Tdarr Integration
- Tdarr pre-processes media for optimal streaming
- Shared GPU resources require coordination
- See
tdarr/CONTEXT.mdfor transcoding system details
Best Practices
Performance
- Use NVMe for cache/transcoding temp directories
- Mount media read-only to prevent accidental modifications
- Enable hardware transcoding for all supported codecs
- Limit concurrent transcodes based on GPU capability
Reliability
- Use
restart: unless-stoppedfor containers - Separate config from cache (different failure modes)
- Monitor disk space on cache volumes
- Regular database backups (config directory)
Security
- Run containers as non-root (PUID/PGID)
- Use read-only media mounts
- Limit network exposure (internal LAN only)
- Regular container image updates
GPU Compatibility Notes
NVIDIA Pascal (GTX 10-series)
- NVENC: H.264, HEVC (no B-frames for HEVC)
- NVDEC: H.264, HEVC, VP8, VP9
- Sessions: 2 concurrent (consumer card limit)
NVIDIA Turing+ (RTX 20-series and newer)
- NVENC: H.264, HEVC (with B-frames), AV1
- NVDEC: H.264, HEVC, VP8, VP9, AV1
- Sessions: 3+ concurrent
GPU Health Monitoring
Jellyfin GPU Monitor
Location: ubuntu-manticore:~/scripts/jellyfin_gpu_monitor.py
Schedule: Every 5 minutes via cron
Logs: ~/logs/jellyfin-gpu-monitor.log
The monitor detects when the Jellyfin container loses GPU access (common after driver updates or Docker restarts) and automatically:
- Sends Discord alert
- Restarts the container to restore GPU access
- Confirms GPU is restored
Manual check:
ssh ubuntu-manticore "python3 ~/scripts/jellyfin_gpu_monitor.py --check"
FFmpeg exit code 187: Indicates NVENC failure due to lost GPU access. The monitor catches this condition before users report playback failures.
Troubleshooting
Common Issues
- No GPU in container: Check Docker/Podman GPU passthrough config
- Transcoding failures: Verify codec support for your GPU generation
- Slow playback start: Check network mount performance
- Cache filling up: Monitor trickplay/thumbnail generation
- FFmpeg exit code 187: GPU access lost - monitor should auto-restart
Diagnostic Commands
# GPU status
nvidia-smi
# Container GPU access
docker exec <container> nvidia-smi
# Encoder/decoder utilization
nvidia-smi dmon -s u
# Container logs
docker logs <container> 2>&1 | tail -50