Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture

Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-12 23:20:15 -05:00

3.4 KiB

Raw Permalink Blame History

GPU Acceleration in Docker Containers

Overview

Patterns for enabling GPU acceleration in Docker containers, particularly for media transcoding workloads.

NVIDIA Container Toolkit Approach

Modern Method (CDI - Container Device Interface)

# Generate CDI configuration
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Use in docker-compose
services:
  app:
    devices:
      - nvidia.com/gpu=all

Legacy Method (Runtime)

# Configure runtime
sudo nvidia-ctk runtime configure --runtime=docker

# Use in docker-compose  
services:
  app:
    runtime: nvidia
    environment:
      - NVIDIA_VISIBLE_DEVICES=all

Compose v3 Method (Deploy)

services:
  app:
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]

Hardware Considerations

High-End Consumer GPUs (RTX 4080/4090)

Excellent NVENC/NVDEC performance
Multiple concurrent transcoding streams
High VRAM for large files

Multi-GPU Setups

environment:
  - NVIDIA_VISIBLE_DEVICES=0,1  # Specific GPUs
  # or
  - NVIDIA_VISIBLE_DEVICES=all  # All GPUs

Troubleshooting Patterns

Gradual Enablement

Start with CPU-only configuration
Verify container functionality
Add GPU support incrementally
Test with simple workloads first

Fallback Strategy

# Include both GPU and CPU fallback
devices:
  - /dev/dri:/dev/dri  # Intel/AMD GPU fallback
deploy:
  resources:
    reservations:
      devices:
      - driver: nvidia
        count: all
        capabilities: [gpu]

Common Issues

Docker service restart failures after toolkit install
CDI vs runtime configuration conflicts
Distribution-specific package differences
Permission issues with device access

Critical Fedora/Nobara GPU Issue

Problem: Docker Desktop GPU Integration Failure

On Fedora-based systems (Fedora, RHEL, CentOS, Nobara), Docker Desktop has significant compatibility issues with NVIDIA Container Toolkit, resulting in:

CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
unknown or invalid runtime name: nvidia
Manual device mounting works but CUDA runtime fails

Solution: Use Podman Instead

# Podman works immediately on Fedora systems
podman run -d --name container-name \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    image:tag

Why Podman Works Better on Fedora

Native systemd integration
Direct hardware access (no VM layer)
Default container engine for RHEL/Fedora
Superior NVIDIA Container Toolkit compatibility

Testing Commands

# Test Docker (often fails on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi

# Test Podman (works on Fedora)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi

Recommendation by OS

Fedora/RHEL/CentOS/Nobara: Use Podman
Ubuntu/Debian: Use Docker
When in doubt: Test both, use what works

Media Transcoding Example (Tdarr)

# Working Podman command for Tdarr on Fedora
podman run -d --name tdarr-node-gpu \
    --device nvidia.com/gpu=all \
    --restart unless-stopped \
    -e nodeName=workstation-gpu \
    -e serverIP=10.10.0.43 \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -v ./media:/media \
    -v ./tmp:/temp \
    ghcr.io/haveagitgat/tdarr_node:latest

3.4 KiB Raw Permalink Blame History