claude-home/docker/examples/gpu-acceleration.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

3.4 KiB

GPU Acceleration in Docker Containers

Overview

Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.

NVIDIA Container Toolkit Approach

Modern Method (CDI - Container Device Interface)

# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Use in docker-compose
services:
  app:
    devices:
      - nvidia.com/gpu=all

Legacy Method (Runtime)

# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Use in docker-compose  
services:
  app:
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all

Compose v3 Method (Deploy)

services:
  app:
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Hardware Considerations

High-End Consumer GPUs (RTX 4080/4090)

  • Excellent NVENC/NVDEC performance
  • Multiple concurrent transcoding streams
  • High VRAM for large files

Multi-GPU Setups

environment:
  - NVIDIA_VISIBLE_DEVICES=0,1  # Specific GPUs
  # or
  - NVIDIA_VISIBLE_DEVICES=all  # All GPUs

Troubleshooting Patterns

Gradual Enablement

  1. Start with CPU-only configuration
  2. Verify container functionality
  3. Add GPU support incrementally
  4. Test with simple workloads first

Fallback Strategy

# Include both GPU and CPU fallback
devices:
  - /dev/dri:/dev/dri  # Intel/AMD GPU fallback
deploy:
  resources:
    reservations:
      devices:
      - driver: nvidia
        count: all
        capabilities: [gpu]

Common Issues

  • Docker service restart failures after toolkit install
  • CDI vs runtime configuration conflicts
  • Distribution-specific package differences
  • Permission issues with device access

Critical Fedora/Nobara GPU Issue

Problem: Docker Desktop GPU Integration Failure

On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:

  • CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
  • unknown or invalid runtime name: nvidia
  • Manual device mounting works but CUDA runtime fails

Solution: Use Podman Instead

# Podman works immediately on Fedora systems
podman run -d --name container-name \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    image:tag

Why Podman Works Better on Fedora

  • Native systemd integration
  • Direct hardware access (no VM layer)
  • Default container engine for RHEL/Fedora
  • Superior NVIDIA Container Toolkit compatibility

Testing Commands

# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi

# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi

Recommendation by OS

  • Fedora/RHEL/CentOS/Nobara: Use Podman
  • Ubuntu/Debian: Use Docker
  • When in doubt: Test both, use what works

Media Transcoding Example (Tdarr)

# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e nodeName=workstation-gpu \
    -e serverIP=10.10.0.43 \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -v ./media:/media \
    -v ./tmp:/temp \
    ghcr.io/haveagitgat/tdarr_node:latest