Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
3.4 KiB
3.4 KiB
GPU Acceleration in Docker Containers
Overview
Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.
NVIDIA Container Toolkit Approach
Modern Method (CDI - Container Device Interface)
# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Use in docker-compose
services:
app:
devices:
- nvidia.com/gpu=all
Legacy Method (Runtime)
# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker
# Use in docker-compose
services:
app:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
Compose v3 Method (Deploy)
services:
app:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Hardware Considerations
High-End Consumer GPUs (RTX 4080/4090)
- Excellent NVENC/NVDEC performance
- Multiple concurrent transcoding streams
- High VRAM for large files
Multi-GPU Setups
environment:
- NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs
# or
- NVIDIA_VISIBLE_DEVICES=all # All GPUs
Troubleshooting Patterns
Gradual Enablement
- Start with CPU-only configuration
- Verify container functionality
- Add GPU support incrementally
- Test with simple workloads first
Fallback Strategy
# Include both GPU and CPU fallback
devices:
- /dev/dri:/dev/dri # Intel/AMD GPU fallback
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Common Issues
- Docker service restart failures after toolkit install
- CDI vs runtime configuration conflicts
- Distribution-specific package differences
- Permission issues with device access
Critical Fedora/Nobara GPU Issue
Problem: Docker Desktop GPU Integration Failure
On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:
CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detectedunknown or invalid runtime name: nvidia- Manual device mounting works but CUDA runtime fails
Solution: Use Podman Instead
# Podman works immediately on Fedora systems
podman run -d --name container-name \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
image:tag
Why Podman Works Better on Fedora
- Native systemd integration
- Direct hardware access (no VM layer)
- Default container engine for RHEL/Fedora
- Superior NVIDIA Container Toolkit compatibility
Testing Commands
# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi
# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi
Recommendation by OS
- Fedora/RHEL/CentOS/Nobara: Use Podman
- Ubuntu/Debian: Use Docker
- When in doubt: Test both, use what works
Media Transcoding Example (Tdarr)
# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e nodeName=workstation-gpu \
-e serverIP=10.10.0.43 \
-e NVIDIA_VISIBLE_DEVICES=all \
-v ./media:/media \
-v ./tmp:/temp \
ghcr.io/haveagitgat/tdarr_node:latest