claude-home/patterns/docker/gpu-acceleration.md
Cal Corum d723924bdf CLAUDE: Add complete GPU transcoding solution for Tdarr containers
- Add working Podman-based GPU Tdarr startup script for Fedora systems
- Document critical Docker Desktop GPU issues on Fedora/Nobara systems
- Add comprehensive Tdarr configuration examples (CPU and GPU variants)
- Add GPU acceleration patterns and troubleshooting documentation
- Provide working solution for NVIDIA RTX GPU hardware transcoding

Key insight: Podman works immediately for GPU access on Fedora systems
where Docker Desktop fails due to virtualization layer conflicts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 00:47:12 -05:00

140 lines
3.4 KiB
Markdown

# GPU Acceleration in Docker Containers
## Overview
Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.
## NVIDIA Container Toolkit Approach
### Modern Method (CDI - Container Device Interface)
```bash
# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Use in docker-compose
services:
app:
devices:
- nvidia.com/gpu=all
```
### Legacy Method (Runtime)
```bash
# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Use in docker-compose
services:
app:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
```
### Compose v3 Method (Deploy)
```yaml
services:
app:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
```
## Hardware Considerations
### High-End Consumer GPUs (RTX 4080/4090)
- Excellent NVENC/NVDEC performance
- Multiple concurrent transcoding streams
- High VRAM for large files
### Multi-GPU Setups
```yaml
environment:
- NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs
# or
- NVIDIA_VISIBLE_DEVICES=all # All GPUs
```
## Troubleshooting Patterns
### Gradual Enablement
1. Start with CPU-only configuration
2. Verify container functionality
3. Add GPU support incrementally
4. Test with simple workloads first
### Fallback Strategy
```yaml
# Include both GPU and CPU fallback
devices:
- /dev/dri:/dev/dri # Intel/AMD GPU fallback
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
```
## Common Issues
- Docker service restart failures after toolkit install
- CDI vs runtime configuration conflicts
- Distribution-specific package differences
- Permission issues with device access
## Critical Fedora/Nobara GPU Issue
### Problem: Docker Desktop GPU Integration Failure
On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:
- `CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected`
- `unknown or invalid runtime name: nvidia`
- Manual device mounting works but CUDA runtime fails
### Solution: Use Podman Instead
```bash
# Podman works immediately on Fedora systems
podman run -d --name container-name \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
image:tag
```
### Why Podman Works Better on Fedora
- Native systemd integration
- Direct hardware access (no VM layer)
- Default container engine for RHEL/Fedora
- Superior NVIDIA Container Toolkit compatibility
### Testing Commands
```bash
# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi
# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi
```
### Recommendation by OS
- **Fedora/RHEL/CentOS/Nobara**: Use Podman
- **Ubuntu/Debian**: Use Docker
- **When in doubt**: Test both, use what works
## Media Transcoding Example (Tdarr)
```bash
# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e nodeName=workstation-gpu \
-e serverIP=10.10.0.43 \
-e NVIDIA_VISIBLE_DEVICES=all \
-v ./media:/media \
-v ./tmp:/temp \
ghcr.io/haveagitgat/tdarr_node:latest
```