Cal Corum d723924bdf CLAUDE: Add complete GPU transcoding solution for Tdarr containers

- Add working Podman-based GPU Tdarr startup script for Fedora systems
- Document critical Docker Desktop GPU issues on Fedora/Nobara systems
- Add comprehensive Tdarr configuration examples (CPU and GPU variants)
- Add GPU acceleration patterns and troubleshooting documentation
- Provide working solution for NVIDIA RTX GPU hardware transcoding

Key insight: Podman works immediately for GPU access on Fedora systems
where Docker Desktop fails due to virtualization layer conflicts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-09 00:47:12 -05:00

5.3 KiB

Raw Blame History

NVIDIA GPU Container Troubleshooting Guide

Key Insights from Fedora/Nobara GPU Container Issues

Problem: Docker Desktop vs Podman GPU Support on Fedora-based Systems

Issue: Docker Desktop on Fedora/Nobara systems has significant compatibility issues with NVIDIA Container Toolkit integration, even when properly configured.

Symptoms:

CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
unknown or invalid runtime name: nvidia
Device nodes created but CUDA runtime fails to initialize
Manual device creation (mknod) works but CUDA still fails

Root Cause: Docker Desktop's virtualization layer interferes with direct hardware access on Fedora-based systems.

Solution: Use Podman Instead of Docker

Why Podman Works Better on Fedora

Native integration: Better integration with systemd and Linux security contexts
Direct hardware access: No VM layer interfering with GPU communication
Superior NVIDIA toolkit support: Works with same nvidia-container-toolkit installation
Built for Fedora: Designed as the default container engine for RHEL/Fedora systems

Verification Commands

# Test basic GPU access with Podman (should work)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi

# Test basic GPU access with Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi

Complete GPU Container Setup for Fedora/Nobara

Prerequisites

NVIDIA drivers installed and working (nvidia-smi functional)
nvidia-container-toolkit installed via DNF
Podman installed (dnf install podman)

NVIDIA Container Toolkit Installation

# Install NVIDIA container toolkit
sudo dnf install nvidia-container-toolkit

# Configure Docker runtime (may not work but worth trying)
sudo nvidia-ctk runtime configure --runtime=docker

# The key insight: Podman works without additional configuration!

Working Podman Command Template

podman run -d --name container-name \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    [other options] \
    image:tag

Troubleshooting Steps (In Order)

1. Verify Host GPU Access

nvidia-smi                    # Should show GPU info
lsmod | grep nvidia          # Should show nvidia modules loaded
ls -la /dev/nvidia*          # Should show device files

2. Test Container Runtime

# Try Podman first (recommended for Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi

# If Podman works but Docker doesn't, use Podman for production

3. Check NVIDIA Container Toolkit

rpm -qa | grep nvidia-container-toolkit
nvidia-ctk --version

4. Verify CUDA Library Locations

# Find CUDA libraries
rpm -ql nvidia-driver-cuda-libs | grep libcuda
ldconfig -p | grep cuda

# Common locations:
# /usr/lib64/libcuda.so*
# /usr/lib64/libnvidia-encode.so*

Common Misconceptions

❌ Docker Should Always Work

Wrong: Docker Desktop has known issues with GPU access on some Linux distributions, especially Fedora-based systems.

❌ More Privileges = Better GPU Access

Wrong: Adding privileged: true or manual device mounting doesn't solve Docker Desktop's fundamental GPU integration issues.

❌ NVIDIA Container Toolkit Problems

Wrong: The toolkit works fine - the issue is Docker Desktop's compatibility with it on Fedora systems.

Best Practices

For Fedora/RHEL/CentOS Systems

Use Podman by default for GPU containers
Test Docker as fallback, but expect issues
Podman Compose works for orchestration
No special configuration needed beyond nvidia-container-toolkit

For Production Deployments

Test both Docker and Podman in your environment
Use whichever works reliably (often Podman on Fedora)
Document which container runtime is used
Include runtime in deployment scripts

Success Indicators

GPU Container Working Correctly

nvidia-smi runs inside container
NVENC/CUDA applications detect GPU
No "CUDA_ERROR_NO_DEVICE" errors
Hardware encoder shows as available in applications

Example: Successful Tdarr Node

# Container logs should show:
# h264_nvenc-true-true,hevc_nvenc-true-true,av1_nvenc-true-true

# FFmpeg test should succeed:
podman exec container-name ffmpeg -f lavfi -i testsrc2=duration=1:size=320x240:rate=1 -c:v h264_nvenc -t 1 /tmp/test.mp4

System-Specific Notes

Nobara/Fedora 42

Docker Desktop: ❌ GPU support problematic
Podman: ✅ GPU support works out of the box
NVIDIA Driver version: 570.169 (tested working)
Container Toolkit version: 1.17.8 (tested working)

Key Files and Locations

GPU devices: /dev/nvidia* (auto-created)
CUDA libraries: /usr/lib64/libcuda.so* (via nvidia-driver-cuda-libs package)
Container toolkit: nvidia-ctk command available
Docker daemon config: /etc/docker/daemon.json (may not help)

Future Reference

When encountering GPU container issues on Fedora-based systems:

Try Podman first - it likely works immediately
Don't waste time troubleshooting Docker Desktop GPU issues
Use the same container images and configurations
Podman commands are nearly identical to Docker commands

This approach saves hours of debugging Docker Desktop GPU integration issues on Fedora systems.

5.3 KiB Raw Blame History