All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.8 KiB
3.8 KiB
| title | description | type | domain | tags | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPU Acceleration in Containers | Patterns for enabling NVIDIA GPU acceleration in Docker and Podman containers, including CDI, runtime, and Compose methods, plus the critical Fedora/Nobara Docker Desktop GPU failure and Podman workaround. | reference | docker |
|
GPU Acceleration in Docker Containers
Overview
Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.
NVIDIA Container Toolkit Approach
Modern Method (CDI - Container Device Interface)
# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Use in docker-compose
services:
app:
devices:
- nvidia.com/gpu=all
Legacy Method (Runtime)
# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Use in docker-compose
services:
app:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
Compose v3 Method (Deploy)
services:
app:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Hardware Considerations
High-End Consumer GPUs (RTX 4080/4090)
- Excellent NVENC/NVDEC performance
- Multiple concurrent transcoding streams
- High VRAM for large files
Multi-GPU Setups
environment:
- NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs
# or
- NVIDIA_VISIBLE_DEVICES=all # All GPUs
Troubleshooting Patterns
Gradual Enablement
- Start with CPU-only configuration
- Verify container functionality
- Add GPU support incrementally
- Test with simple workloads first
Fallback Strategy
# Include both GPU and CPU fallback
devices:
- /dev/dri:/dev/dri # Intel/AMD GPU fallback
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Common Issues
- Docker service restart failures after toolkit install
- CDI vs runtime configuration conflicts
- Distribution-specific package differences
- Permission issues with device access
Critical Fedora/Nobara GPU Issue
Problem: Docker Desktop GPU Integration Failure
On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:
CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detectedunknown or invalid runtime name: nvidia- Manual device mounting works but CUDA runtime fails
Solution: Use Podman Instead
# Podman works immediately on Fedora systems
podman run -d --name container-name \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
image:tag
Why Podman Works Better on Fedora
- Native systemd integration
- Direct hardware access (no VM layer)
- Default container engine for RHEL/Fedora
- Superior NVIDIA Container Toolkit compatibility
Testing Commands
# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi
# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi
Recommendation by OS
- Fedora/RHEL/CentOS/Nobara: Use Podman
- Ubuntu/Debian: Use Docker
- When in doubt: Test both, use what works
Media Transcoding Example (Tdarr)
# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e nodeName=workstation-gpu \
-e serverIP=10.10.0.43 \
-e NVIDIA_VISIBLE_DEVICES=all \
-v ./media:/media \
-v ./tmp:/temp \
ghcr.io/haveagitgat/tdarr_node:latest