claude-home/media-servers/CONTEXT.md
Cal Corum 3112b3d6fe CLAUDE: Add Jellyfin GPU health monitor with auto-restart
- Created jellyfin_gpu_monitor.py for detecting lost GPU access
- Sends Discord alerts when GPU access fails
- Auto-restarts container to restore GPU binding
- Runs every 5 minutes via cron on ubuntu-manticore
- Documents FFmpeg exit code 187 (NVENC failure) in troubleshooting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 22:57:04 -06:00

156 lines
4.4 KiB
Markdown

# Media Servers - Technology Context
## Overview
Media server infrastructure for home lab environments, covering streaming services like Jellyfin and Plex with hardware-accelerated transcoding, library management, and client discovery.
## Current Deployments
### Jellyfin on ubuntu-manticore
- **Location**: 10.10.0.226:8096
- **GPU**: NVIDIA GTX 1070 (NVENC/NVDEC)
- **Documentation**: `jellyfin-ubuntu-manticore.md`
### Plex (Existing)
- **Location**: TBD (potential migration to ubuntu-manticore)
- **Note**: Currently running elsewhere, may migrate for GPU access
## Architecture Patterns
### GPU-Accelerated Transcoding
**Pattern**: Hardware encoding/decoding for real-time streaming
```yaml
# Docker Compose GPU passthrough
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_DRIVER_CAPABILITIES=all
- NVIDIA_VISIBLE_DEVICES=all
```
### Storage Strategy
**Pattern**: Tiered storage for different access patterns
- **Config**: Local SSD (small, fast database access)
- **Cache**: Local NVMe (transcoding temp, thumbnails)
- **Media**: Network storage (large capacity, read-only mount)
### Multi-Service GPU Sharing
**Pattern**: Resource allocation when multiple services share GPU
- Limit background tasks (Tdarr) to fewer concurrent jobs
- Prioritize real-time services (Jellyfin/Plex playback)
- Consumer GPUs limited to 2-3 concurrent NVENC sessions
## Common Configurations
### NVIDIA GPU Setup
```bash
# Verify GPU in container
docker exec <container> nvidia-smi
# Check encoder/decoder utilization
nvidia-smi dmon -s u
```
### Media Volume Mounts
```yaml
volumes:
- /mnt/truenas/media:/media:ro # Read-only for safety
```
### Client Discovery
- **Jellyfin**: UDP 7359
- **Plex**: UDP 32410-32414, GDM
## Integration Points
### Watch History Sync
- **Tool**: watchstate (ghcr.io/arabcoders/watchstate)
- **Method**: API-based sync between services
- **Note**: NFO files do NOT store watch history
### Tdarr Integration
- Tdarr pre-processes media for optimal streaming
- Shared GPU resources require coordination
- See `tdarr/CONTEXT.md` for transcoding system details
## Best Practices
### Performance
1. Use NVMe for cache/transcoding temp directories
2. Mount media read-only to prevent accidental modifications
3. Enable hardware transcoding for all supported codecs
4. Limit concurrent transcodes based on GPU capability
### Reliability
1. Use `restart: unless-stopped` for containers
2. Separate config from cache (different failure modes)
3. Monitor disk space on cache volumes
4. Regular database backups (config directory)
### Security
1. Run containers as non-root (PUID/PGID)
2. Use read-only media mounts
3. Limit network exposure (internal LAN only)
4. Regular container image updates
## GPU Compatibility Notes
### NVIDIA Pascal (GTX 10-series)
- NVENC: H.264, HEVC (no B-frames for HEVC)
- NVDEC: H.264, HEVC, VP8, VP9
- Sessions: 2 concurrent (consumer card limit)
### NVIDIA Turing+ (RTX 20-series and newer)
- NVENC: H.264, HEVC (with B-frames), AV1
- NVDEC: H.264, HEVC, VP8, VP9, AV1
- Sessions: 3+ concurrent
## GPU Health Monitoring
### Jellyfin GPU Monitor
**Location**: `ubuntu-manticore:~/scripts/jellyfin_gpu_monitor.py`
**Schedule**: Every 5 minutes via cron
**Logs**: `~/logs/jellyfin-gpu-monitor.log`
The monitor detects when the Jellyfin container loses GPU access (common after
driver updates or Docker restarts) and automatically:
1. Sends Discord alert
2. Restarts the container to restore GPU access
3. Confirms GPU is restored
**Manual check:**
```bash
ssh ubuntu-manticore "python3 ~/scripts/jellyfin_gpu_monitor.py --check"
```
**FFmpeg exit code 187**: Indicates NVENC failure due to lost GPU access.
The monitor catches this condition before users report playback failures.
## Troubleshooting
### Common Issues
1. **No GPU in container**: Check Docker/Podman GPU passthrough config
2. **Transcoding failures**: Verify codec support for your GPU generation
3. **Slow playback start**: Check network mount performance
4. **Cache filling up**: Monitor trickplay/thumbnail generation
5. **FFmpeg exit code 187**: GPU access lost - monitor should auto-restart
### Diagnostic Commands
```bash
# GPU status
nvidia-smi
# Container GPU access
docker exec <container> nvidia-smi
# Encoder/decoder utilization
nvidia-smi dmon -s u
# Container logs
docker logs <container> 2>&1 | tail -50
```