claude-home/patterns/docker/gpu-acceleration.md
Cal Corum d723924bdf CLAUDE: Add complete GPU transcoding solution for Tdarr containers
- Add working Podman-based GPU Tdarr startup script for Fedora systems
- Document critical Docker Desktop GPU issues on Fedora/Nobara systems
- Add comprehensive Tdarr configuration examples (CPU and GPU variants)
- Add GPU acceleration patterns and troubleshooting documentation
- Provide working solution for NVIDIA RTX GPU hardware transcoding

Key insight: Podman works immediately for GPU access on Fedora systems
where Docker Desktop fails due to virtualization layer conflicts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 00:47:12 -05:00

3.4 KiB

GPU Acceleration in Docker Containers

Overview

Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.

NVIDIA Container Toolkit Approach

Modern Method (CDI - Container Device Interface)

# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Use in docker-compose
services:
  app:
    devices:
      - nvidia.com/gpu=all

Legacy Method (Runtime)

# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Use in docker-compose  
services:
  app:
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all

Compose v3 Method (Deploy)

services:
  app:
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Hardware Considerations

High-End Consumer GPUs (RTX 4080/4090)

  • Excellent NVENC/NVDEC performance
  • Multiple concurrent transcoding streams
  • High VRAM for large files

Multi-GPU Setups

environment:
  - NVIDIA_VISIBLE_DEVICES=0,1  # Specific GPUs
  # or
  - NVIDIA_VISIBLE_DEVICES=all  # All GPUs

Troubleshooting Patterns

Gradual Enablement

  1. Start with CPU-only configuration
  2. Verify container functionality
  3. Add GPU support incrementally
  4. Test with simple workloads first

Fallback Strategy

# Include both GPU and CPU fallback
devices:
  - /dev/dri:/dev/dri  # Intel/AMD GPU fallback
deploy:
  resources:
    reservations:
      devices:
      - driver: nvidia
        count: all
        capabilities: [gpu]

Common Issues

  • Docker service restart failures after toolkit install
  • CDI vs runtime configuration conflicts
  • Distribution-specific package differences
  • Permission issues with device access

Critical Fedora/Nobara GPU Issue

Problem: Docker Desktop GPU Integration Failure

On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:

  • CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
  • unknown or invalid runtime name: nvidia
  • Manual device mounting works but CUDA runtime fails

Solution: Use Podman Instead

# Podman works immediately on Fedora systems
podman run -d --name container-name \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    image:tag

Why Podman Works Better on Fedora

  • Native systemd integration
  • Direct hardware access (no VM layer)
  • Default container engine for RHEL/Fedora
  • Superior NVIDIA Container Toolkit compatibility

Testing Commands

# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi

# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi

Recommendation by OS

  • Fedora/RHEL/CentOS/Nobara: Use Podman
  • Ubuntu/Debian: Use Docker
  • When in doubt: Test both, use what works

Media Transcoding Example (Tdarr)

# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e nodeName=workstation-gpu \
    -e serverIP=10.10.0.43 \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -v ./media:/media \
    -v ./tmp:/temp \
    ghcr.io/haveagitgat/tdarr_node:latest