Reindex Knowledge Base / reindex (push) Successful in 3s

Details

docs: add YAML frontmatter to all 151 markdown files

Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-12 09:00:44 -05:00

3.8 KiB

Raw Blame History

title

description

type

domain

GPU Acceleration in Docker Containers

Overview

Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.

NVIDIA Container Toolkit Approach

Modern Method (CDI - Container Device Interface)

# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Use in docker-compose
services:
  app:
    devices:
      - nvidia.com/gpu=all

Legacy Method (Runtime)

# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Use in docker-compose  
services:
  app:
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all

Compose v3 Method (Deploy)

services:
  app:
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Hardware Considerations

High-End Consumer GPUs (RTX 4080/4090)

Excellent NVENC/NVDEC performance
Multiple concurrent transcoding streams
High VRAM for large files

Multi-GPU Setups

environment:
  - NVIDIA_VISIBLE_DEVICES=0,1  # Specific GPUs
  # or
  - NVIDIA_VISIBLE_DEVICES=all  # All GPUs

Troubleshooting Patterns

Gradual Enablement

Start with CPU-only configuration
Verify container functionality
Add GPU support incrementally
Test with simple workloads first

Fallback Strategy

# Include both GPU and CPU fallback
devices:
  - /dev/dri:/dev/dri  # Intel/AMD GPU fallback
deploy:
  resources:
    reservations:
      devices:
      - driver: nvidia
        count: all
        capabilities: [gpu]

Common Issues

Docker service restart failures after toolkit install
CDI vs runtime configuration conflicts
Distribution-specific package differences
Permission issues with device access

Critical Fedora/Nobara GPU Issue

Problem: Docker Desktop GPU Integration Failure

On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:

CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
unknown or invalid runtime name: nvidia
Manual device mounting works but CUDA runtime fails

Solution: Use Podman Instead

# Podman works immediately on Fedora systems
podman run -d --name container-name \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    image:tag

Why Podman Works Better on Fedora

Native systemd integration
Direct hardware access (no VM layer)
Default container engine for RHEL/Fedora
Superior NVIDIA Container Toolkit compatibility

Testing Commands

# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi

# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi

Recommendation by OS

Fedora/RHEL/CentOS/Nobara: Use Podman
Ubuntu/Debian: Use Docker
When in doubt: Test both, use what works

Media Transcoding Example (Tdarr)

# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e nodeName=workstation-gpu \
    -e serverIP=10.10.0.43 \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -v ./media:/media \
    -v ./tmp:/temp \
    ghcr.io/haveagitgat/tdarr_node:latest

3.8 KiB Raw Blame History