# GPU Acceleration in Docker Containers ## Overview Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads. ## NVIDIA Container Toolkit Approach ### Modern Method (CDI - Container Device Interface) ```bash # Generate CDI configuration sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml # Use in docker-compose services: app: devices: - nvidia.com/gpu=all ``` ### Legacy Method (Runtime) ```bash # Configure runtime sudo nvidia-ctk runtime configure --runtime=docker # Use in docker-compose services: app: runtime: nvidia environment: - NVIDIA_VISIBLE_DEVICES=all ``` ### Compose v3 Method (Deploy) ```yaml services: app: deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] ``` ## Hardware Considerations ### High-End Consumer GPUs (RTX 4080/4090) - Excellent NVENC/NVDEC performance - Multiple concurrent transcoding streams - High VRAM for large files ### Multi-GPU Setups ```yaml environment: - NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs # or - NVIDIA_VISIBLE_DEVICES=all # All GPUs ``` ## Troubleshooting Patterns ### Gradual Enablement 1. Start with CPU-only configuration 2. Verify container functionality 3. Add GPU support incrementally 4. Test with simple workloads first ### Fallback Strategy ```yaml # Include both GPU and CPU fallback devices: - /dev/dri:/dev/dri # Intel/AMD GPU fallback deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] ``` ## Common Issues - Docker service restart failures after toolkit install - CDI vs runtime configuration conflicts - Distribution-specific package differences - Permission issues with device access ## Critical Fedora/Nobara GPU Issue ### Problem: Docker Desktop GPU Integration Failure On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in: - `CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected` - `unknown or invalid runtime name: nvidia` - Manual device mounting works but CUDA runtime fails ### Solution: Use Podman Instead ```bash # Podman works immediately on Fedora systems podman run -d --name container-name \ --device nvidia.com/gpu=all \ --restart unless-stopped \ -e NVIDIA_DRIVER_CAPABILITIES=all \ -e NVIDIA_VISIBLE_DEVICES=all \ image:tag ``` ### Why Podman Works Better on Fedora - Native systemd integration - Direct hardware access (no VM layer) - Default container engine for RHEL/Fedora - Superior NVIDIA Container Toolkit compatibility ### Testing Commands ```bash # Test Docker (often fails on Fedora) docker run --rm --gpus all ubuntu:20.04 nvidia-smi # Test Podman (works on Fedora) podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi ``` ### Recommendation by OS - **Fedora/RHEL/CentOS/Nobara**: Use Podman - **Ubuntu/Debian**: Use Docker - **When in doubt**: Test both, use what works ## Media Transcoding Example (Tdarr) ```bash # Working Podman command for Tdarr on Fedora podman run -d --name tdarr-node-gpu \ --device nvidia.com/gpu=all \ --restart unless-stopped \ -e nodeName=workstation-gpu \ -e serverIP=10.10.0.43 \ -e NVIDIA_VISIBLE_DEVICES=all \ -v ./media:/media \ -v ./tmp:/temp \ ghcr.io/haveagitgat/tdarr_node:latest ```