All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
148 lines
3.8 KiB
Markdown
148 lines
3.8 KiB
Markdown
---
|
|
title: "GPU Acceleration in Containers"
|
|
description: "Patterns for enabling NVIDIA GPU acceleration in Docker and Podman containers, including CDI, runtime, and Compose methods, plus the critical Fedora/Nobara Docker Desktop GPU failure and Podman workaround."
|
|
type: reference
|
|
domain: docker
|
|
tags: [gpu, nvidia, docker, podman, cuda, nvenc, fedora, tdarr, cdi]
|
|
---
|
|
|
|
# GPU Acceleration in Docker Containers
|
|
|
|
## Overview
|
|
Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.
|
|
|
|
## NVIDIA Container Toolkit Approach
|
|
|
|
### Modern Method (CDI - Container Device Interface)
|
|
```bash
|
|
# Generate CDI configuration
|
|
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
|
|
|
# Use in docker-compose
|
|
services:
|
|
app:
|
|
devices:
|
|
- nvidia.com/gpu=all
|
|
```
|
|
|
|
### Legacy Method (Runtime)
|
|
```bash
|
|
# Configure runtime
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
|
|
# Use in docker-compose
|
|
services:
|
|
app:
|
|
runtime: nvidia
|
|
environment:
|
|
- NVIDIA_VISIBLE_DEVICES=all
|
|
```
|
|
|
|
### Compose v3 Method (Deploy)
|
|
```yaml
|
|
services:
|
|
app:
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia
|
|
count: all
|
|
capabilities: [gpu]
|
|
```
|
|
|
|
## Hardware Considerations
|
|
|
|
### High-End Consumer GPUs (RTX 4080/4090)
|
|
- Excellent NVENC/NVDEC performance
|
|
- Multiple concurrent transcoding streams
|
|
- High VRAM for large files
|
|
|
|
### Multi-GPU Setups
|
|
```yaml
|
|
environment:
|
|
- NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs
|
|
# or
|
|
- NVIDIA_VISIBLE_DEVICES=all # All GPUs
|
|
```
|
|
|
|
## Troubleshooting Patterns
|
|
|
|
### Gradual Enablement
|
|
1. Start with CPU-only configuration
|
|
2. Verify container functionality
|
|
3. Add GPU support incrementally
|
|
4. Test with simple workloads first
|
|
|
|
### Fallback Strategy
|
|
```yaml
|
|
# Include both GPU and CPU fallback
|
|
devices:
|
|
- /dev/dri:/dev/dri # Intel/AMD GPU fallback
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia
|
|
count: all
|
|
capabilities: [gpu]
|
|
```
|
|
|
|
## Common Issues
|
|
- Docker service restart failures after toolkit install
|
|
- CDI vs runtime configuration conflicts
|
|
- Distribution-specific package differences
|
|
- Permission issues with device access
|
|
|
|
## Critical Fedora/Nobara GPU Issue
|
|
|
|
### Problem: Docker Desktop GPU Integration Failure
|
|
On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:
|
|
- `CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected`
|
|
- `unknown or invalid runtime name: nvidia`
|
|
- Manual device mounting works but CUDA runtime fails
|
|
|
|
### Solution: Use Podman Instead
|
|
```bash
|
|
# Podman works immediately on Fedora systems
|
|
podman run -d --name container-name \
|
|
--device nvidia.com/gpu=all \
|
|
--restart unless-stopped \
|
|
-e NVIDIA_DRIVER_CAPABILITIES=all \
|
|
-e NVIDIA_VISIBLE_DEVICES=all \
|
|
image:tag
|
|
```
|
|
|
|
### Why Podman Works Better on Fedora
|
|
- Native systemd integration
|
|
- Direct hardware access (no VM layer)
|
|
- Default container engine for RHEL/Fedora
|
|
- Superior NVIDIA Container Toolkit compatibility
|
|
|
|
### Testing Commands
|
|
```bash
|
|
# Test Docker (often fails on Fedora)
|
|
docker run --rm --gpus all ubuntu:20.04 nvidia-smi
|
|
|
|
# Test Podman (works on Fedora)
|
|
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi
|
|
```
|
|
|
|
### Recommendation by OS
|
|
- **Fedora/RHEL/CentOS/Nobara**: Use Podman
|
|
- **Ubuntu/Debian**: Use Docker
|
|
- **When in doubt**: Test both, use what works
|
|
|
|
## Media Transcoding Example (Tdarr)
|
|
```bash
|
|
# Working Podman command for Tdarr on Fedora
|
|
podman run -d --name tdarr-node-gpu \
|
|
--device nvidia.com/gpu=all \
|
|
--restart unless-stopped \
|
|
-e nodeName=workstation-gpu \
|
|
-e serverIP=10.10.0.43 \
|
|
-e NVIDIA_VISIBLE_DEVICES=all \
|
|
-v ./media:/media \
|
|
-v ./tmp:/temp \
|
|
ghcr.io/haveagitgat/tdarr_node:latest
|
|
``` |