Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
140 lines
3.4 KiB
Markdown
140 lines
3.4 KiB
Markdown
# GPU Acceleration in Docker Containers
|
|
|
|
## Overview
|
|
Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.
|
|
|
|
## NVIDIA Container Toolkit Approach
|
|
|
|
### Modern Method (CDI - Container Device Interface)
|
|
```bash
|
|
# Generate CDI configuration
|
|
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
|
|
|
# Use in docker-compose
|
|
services:
|
|
app:
|
|
devices:
|
|
- nvidia.com/gpu=all
|
|
```
|
|
|
|
### Legacy Method (Runtime)
|
|
```bash
|
|
# Configure runtime
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
|
|
# Use in docker-compose
|
|
services:
|
|
app:
|
|
runtime: nvidia
|
|
environment:
|
|
- NVIDIA_VISIBLE_DEVICES=all
|
|
```
|
|
|
|
### Compose v3 Method (Deploy)
|
|
```yaml
|
|
services:
|
|
app:
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia
|
|
count: all
|
|
capabilities: [gpu]
|
|
```
|
|
|
|
## Hardware Considerations
|
|
|
|
### High-End Consumer GPUs (RTX 4080/4090)
|
|
- Excellent NVENC/NVDEC performance
|
|
- Multiple concurrent transcoding streams
|
|
- High VRAM for large files
|
|
|
|
### Multi-GPU Setups
|
|
```yaml
|
|
environment:
|
|
- NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs
|
|
# or
|
|
- NVIDIA_VISIBLE_DEVICES=all # All GPUs
|
|
```
|
|
|
|
## Troubleshooting Patterns
|
|
|
|
### Gradual Enablement
|
|
1. Start with CPU-only configuration
|
|
2. Verify container functionality
|
|
3. Add GPU support incrementally
|
|
4. Test with simple workloads first
|
|
|
|
### Fallback Strategy
|
|
```yaml
|
|
# Include both GPU and CPU fallback
|
|
devices:
|
|
- /dev/dri:/dev/dri # Intel/AMD GPU fallback
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia
|
|
count: all
|
|
capabilities: [gpu]
|
|
```
|
|
|
|
## Common Issues
|
|
- Docker service restart failures after toolkit install
|
|
- CDI vs runtime configuration conflicts
|
|
- Distribution-specific package differences
|
|
- Permission issues with device access
|
|
|
|
## Critical Fedora/Nobara GPU Issue
|
|
|
|
### Problem: Docker Desktop GPU Integration Failure
|
|
On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:
|
|
- `CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected`
|
|
- `unknown or invalid runtime name: nvidia`
|
|
- Manual device mounting works but CUDA runtime fails
|
|
|
|
### Solution: Use Podman Instead
|
|
```bash
|
|
# Podman works immediately on Fedora systems
|
|
podman run -d --name container-name \
|
|
--device nvidia.com/gpu=all \
|
|
--restart unless-stopped \
|
|
-e NVIDIA_DRIVER_CAPABILITIES=all \
|
|
-e NVIDIA_VISIBLE_DEVICES=all \
|
|
image:tag
|
|
```
|
|
|
|
### Why Podman Works Better on Fedora
|
|
- Native systemd integration
|
|
- Direct hardware access (no VM layer)
|
|
- Default container engine for RHEL/Fedora
|
|
- Superior NVIDIA Container Toolkit compatibility
|
|
|
|
### Testing Commands
|
|
```bash
|
|
# Test Docker (often fails on Fedora)
|
|
docker run --rm --gpus all ubuntu:20.04 nvidia-smi
|
|
|
|
# Test Podman (works on Fedora)
|
|
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi
|
|
```
|
|
|
|
### Recommendation by OS
|
|
- **Fedora/RHEL/CentOS/Nobara**: Use Podman
|
|
- **Ubuntu/Debian**: Use Docker
|
|
- **When in doubt**: Test both, use what works
|
|
|
|
## Media Transcoding Example (Tdarr)
|
|
```bash
|
|
# Working Podman command for Tdarr on Fedora
|
|
podman run -d --name tdarr-node-gpu \
|
|
--device nvidia.com/gpu=all \
|
|
--restart unless-stopped \
|
|
-e nodeName=workstation-gpu \
|
|
-e serverIP=10.10.0.43 \
|
|
-e NVIDIA_VISIBLE_DEVICES=all \
|
|
-v ./media:/media \
|
|
-v ./tmp:/temp \
|
|
ghcr.io/haveagitgat/tdarr_node:latest
|
|
``` |