claude-home/docker/examples/nvidia-troubleshooting.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

110 lines
2.7 KiB
Markdown

---
title: "NVIDIA Container Toolkit Setup"
description: "Installation and troubleshooting reference for nvidia-container-toolkit on Fedora/DNF and Ubuntu/APT, covering daemon.json configuration, CDI method, and GPU detection issues."
type: reference
domain: docker
tags: [nvidia, container-toolkit, gpu, docker, fedora, ubuntu, installation, daemon-json]
---
# NVIDIA Container Toolkit Troubleshooting
## Installation by Distribution
### Fedora/Nobara (DNF)
```bash
# Remove conflicting packages
sudo dnf remove golang-github-nvidia-container-toolkit
# Add official repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# Install toolkit
sudo dnf install -y nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
```
### Ubuntu/Debian (APT)
```bash
# Add repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] \
https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
```
## Common Issues
### Docker Service Won't Start
```bash
# Check daemon logs
sudo journalctl -xeu docker.service
# Common fixes:
sudo systemctl stop docker.socket
sudo systemctl start docker.socket
sudo systemctl start docker
# Or reset configuration
sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.backup
sudo systemctl restart docker
```
### GPU Not Detected
```bash
# Verify nvidia-smi works
nvidia-smi
# Check runtime registration
docker info | grep -i runtime
# Test with simple container
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
```
### CDI Method (Alternative)
```bash
# Generate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Use in compose
services:
app:
devices:
- nvidia.com/gpu=all
```
## Configuration Patterns
### daemon.json Structure
```json
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
```
### Testing GPU Access
```bash
# Test with Tdarr node image
docker run --rm --gpus all ghcr.io/haveagitgat/tdarr_node:latest nvidia-smi
# Expected output: GPU information table
```
## Fallback Strategies
1. Start with CPU-only configuration
2. Verify container functionality first
3. Add GPU support incrementally
4. Keep Intel/AMD GPU fallback enabled