claude-home/media-servers/CONTEXT.md
Cal Corum 3112b3d6fe CLAUDE: Add Jellyfin GPU health monitor with auto-restart
- Created jellyfin_gpu_monitor.py for detecting lost GPU access
- Sends Discord alerts when GPU access fails
- Auto-restarts container to restore GPU binding
- Runs every 5 minutes via cron on ubuntu-manticore
- Documents FFmpeg exit code 187 (NVENC failure) in troubleshooting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-28 22:57:04 -06:00

4.4 KiB

Media Servers - Technology Context

Overview

Media server infrastructure for home lab environments, covering streaming services like Jellyfin and Plex with hardware-accelerated transcoding, library management, and client discovery.

Current Deployments

Jellyfin on ubuntu-manticore

  • Location: 10.10.0.226:8096
  • GPU: NVIDIA GTX 1070 (NVENC/NVDEC)
  • Documentation: jellyfin-ubuntu-manticore.md

Plex (Existing)

  • Location: TBD (potential migration to ubuntu-manticore)
  • Note: Currently running elsewhere, may migrate for GPU access

Architecture Patterns

GPU-Accelerated Transcoding

Pattern: Hardware encoding/decoding for real-time streaming

# Docker Compose GPU passthrough
deploy:
  resources:
    reservations:
      devices:
        - driver: nvidia
          count: all
          capabilities: [gpu]
environment:
  - NVIDIA_DRIVER_CAPABILITIES=all
  - NVIDIA_VISIBLE_DEVICES=all

Storage Strategy

Pattern: Tiered storage for different access patterns

  • Config: Local SSD (small, fast database access)
  • Cache: Local NVMe (transcoding temp, thumbnails)
  • Media: Network storage (large capacity, read-only mount)

Multi-Service GPU Sharing

Pattern: Resource allocation when multiple services share GPU

  • Limit background tasks (Tdarr) to fewer concurrent jobs
  • Prioritize real-time services (Jellyfin/Plex playback)
  • Consumer GPUs limited to 2-3 concurrent NVENC sessions

Common Configurations

NVIDIA GPU Setup

# Verify GPU in container
docker exec <container> nvidia-smi

# Check encoder/decoder utilization
nvidia-smi dmon -s u

Media Volume Mounts

volumes:
  - /mnt/truenas/media:/media:ro  # Read-only for safety

Client Discovery

  • Jellyfin: UDP 7359
  • Plex: UDP 32410-32414, GDM

Integration Points

Watch History Sync

  • Tool: watchstate (ghcr.io/arabcoders/watchstate)
  • Method: API-based sync between services
  • Note: NFO files do NOT store watch history

Tdarr Integration

  • Tdarr pre-processes media for optimal streaming
  • Shared GPU resources require coordination
  • See tdarr/CONTEXT.md for transcoding system details

Best Practices

Performance

  1. Use NVMe for cache/transcoding temp directories
  2. Mount media read-only to prevent accidental modifications
  3. Enable hardware transcoding for all supported codecs
  4. Limit concurrent transcodes based on GPU capability

Reliability

  1. Use restart: unless-stopped for containers
  2. Separate config from cache (different failure modes)
  3. Monitor disk space on cache volumes
  4. Regular database backups (config directory)

Security

  1. Run containers as non-root (PUID/PGID)
  2. Use read-only media mounts
  3. Limit network exposure (internal LAN only)
  4. Regular container image updates

GPU Compatibility Notes

NVIDIA Pascal (GTX 10-series)

  • NVENC: H.264, HEVC (no B-frames for HEVC)
  • NVDEC: H.264, HEVC, VP8, VP9
  • Sessions: 2 concurrent (consumer card limit)

NVIDIA Turing+ (RTX 20-series and newer)

  • NVENC: H.264, HEVC (with B-frames), AV1
  • NVDEC: H.264, HEVC, VP8, VP9, AV1
  • Sessions: 3+ concurrent

GPU Health Monitoring

Jellyfin GPU Monitor

Location: ubuntu-manticore:~/scripts/jellyfin_gpu_monitor.py Schedule: Every 5 minutes via cron Logs: ~/logs/jellyfin-gpu-monitor.log

The monitor detects when the Jellyfin container loses GPU access (common after driver updates or Docker restarts) and automatically:

  1. Sends Discord alert
  2. Restarts the container to restore GPU access
  3. Confirms GPU is restored

Manual check:

ssh ubuntu-manticore "python3 ~/scripts/jellyfin_gpu_monitor.py --check"

FFmpeg exit code 187: Indicates NVENC failure due to lost GPU access. The monitor catches this condition before users report playback failures.

Troubleshooting

Common Issues

  1. No GPU in container: Check Docker/Podman GPU passthrough config
  2. Transcoding failures: Verify codec support for your GPU generation
  3. Slow playback start: Check network mount performance
  4. Cache filling up: Monitor trickplay/thumbnail generation
  5. FFmpeg exit code 187: GPU access lost - monitor should auto-restart

Diagnostic Commands

# GPU status
nvidia-smi

# Container GPU access
docker exec <container> nvidia-smi

# Encoder/decoder utilization
nvidia-smi dmon -s u

# Container logs
docker logs <container> 2>&1 | tail -50