claude-home/docker/examples/gpu-acceleration.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

148 lines
3.8 KiB
Markdown

---
title: "GPU Acceleration in Containers"
description: "Patterns for enabling NVIDIA GPU acceleration in Docker and Podman containers, including CDI, runtime, and Compose methods, plus the critical Fedora/Nobara Docker Desktop GPU failure and Podman workaround."
type: reference
domain: docker
tags: [gpu, nvidia, docker, podman, cuda, nvenc, fedora, tdarr, cdi]
---
# GPU Acceleration in Docker Containers
## Overview
Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.
## NVIDIA Container Toolkit Approach
### Modern Method (CDI - Container Device Interface)
```bash
# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Use in docker-compose
services:
app:
devices:
- nvidia.com/gpu=all
```
### Legacy Method (Runtime)
```bash
# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Use in docker-compose
services:
app:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
```
### Compose v3 Method (Deploy)
```yaml
services:
app:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
```
## Hardware Considerations
### High-End Consumer GPUs (RTX 4080/4090)
- Excellent NVENC/NVDEC performance
- Multiple concurrent transcoding streams
- High VRAM for large files
### Multi-GPU Setups
```yaml
environment:
- NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs
# or
- NVIDIA_VISIBLE_DEVICES=all # All GPUs
```
## Troubleshooting Patterns
### Gradual Enablement
1. Start with CPU-only configuration
2. Verify container functionality
3. Add GPU support incrementally
4. Test with simple workloads first
### Fallback Strategy
```yaml
# Include both GPU and CPU fallback
devices:
- /dev/dri:/dev/dri # Intel/AMD GPU fallback
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
```
## Common Issues
- Docker service restart failures after toolkit install
- CDI vs runtime configuration conflicts
- Distribution-specific package differences
- Permission issues with device access
## Critical Fedora/Nobara GPU Issue
### Problem: Docker Desktop GPU Integration Failure
On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:
- `CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected`
- `unknown or invalid runtime name: nvidia`
- Manual device mounting works but CUDA runtime fails
### Solution: Use Podman Instead
```bash
# Podman works immediately on Fedora systems
podman run -d --name container-name \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
image:tag
```
### Why Podman Works Better on Fedora
- Native systemd integration
- Direct hardware access (no VM layer)
- Default container engine for RHEL/Fedora
- Superior NVIDIA Container Toolkit compatibility
### Testing Commands
```bash
# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi
# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi
```
### Recommendation by OS
- **Fedora/RHEL/CentOS/Nobara**: Use Podman
- **Ubuntu/Debian**: Use Docker
- **When in doubt**: Test both, use what works
## Media Transcoding Example (Tdarr)
```bash
# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e nodeName=workstation-gpu \
-e serverIP=10.10.0.43 \
-e NVIDIA_VISIBLE_DEVICES=all \
-v ./media:/media \
-v ./tmp:/temp \
ghcr.io/haveagitgat/tdarr_node:latest
```