claude-home/docker/examples/nvidia-troubleshooting.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

2.7 KiB

title description type domain tags
NVIDIA Container Toolkit Setup Installation and troubleshooting reference for nvidia-container-toolkit on Fedora/DNF and Ubuntu/APT, covering daemon.json configuration, CDI method, and GPU detection issues. reference docker
nvidia
container-toolkit
gpu
docker
fedora
ubuntu
installation
daemon-json

NVIDIA Container Toolkit Troubleshooting

Installation by Distribution

Fedora/Nobara (DNF)

# Remove conflicting packages
sudo dnf remove golang-github-nvidia-container-toolkit

# Add official repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

# Install toolkit
sudo dnf install -y nvidia-container-toolkit

# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker

Ubuntu/Debian (APT)

# Add repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] \
  https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker

Common Issues

Docker Service Won't Start

# Check daemon logs
sudo journalctl -xeu docker.service

# Common fixes:
sudo systemctl stop docker.socket
sudo systemctl start docker.socket
sudo systemctl start docker

# Or reset configuration
sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.backup
sudo systemctl restart docker

GPU Not Detected

# Verify nvidia-smi works
nvidia-smi

# Check runtime registration
docker info | grep -i runtime

# Test with simple container
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi

CDI Method (Alternative)

# Generate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Use in compose
services:
  app:
    devices:
      - nvidia.com/gpu=all

Configuration Patterns

daemon.json Structure

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Testing GPU Access

# Test with Tdarr node image
docker run --rm --gpus all ghcr.io/haveagitgat/tdarr_node:latest nvidia-smi

# Expected output: GPU information table

Fallback Strategies

  1. Start with CPU-only configuration
  2. Verify container functionality first
  3. Add GPU support incrementally
  4. Keep Intel/AMD GPU fallback enabled