All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2.7 KiB
2.7 KiB
| title | description | type | domain | tags | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NVIDIA Container Toolkit Setup | Installation and troubleshooting reference for nvidia-container-toolkit on Fedora/DNF and Ubuntu/APT, covering daemon.json configuration, CDI method, and GPU detection issues. | reference | docker |
|
NVIDIA Container Toolkit Troubleshooting
Installation by Distribution
Fedora/Nobara (DNF)
# Remove conflicting packages
sudo dnf remove golang-github-nvidia-container-toolkit
# Add official repository
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# Install toolkit
sudo dnf install -y nvidia-container-toolkit
# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
Ubuntu/Debian (APT)
# Add repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] \
https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
Common Issues
Docker Service Won't Start
# Check daemon logs
sudo journalctl -xeu docker.service
# Common fixes:
sudo systemctl stop docker.socket
sudo systemctl start docker.socket
sudo systemctl start docker
# Or reset configuration
sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.backup
sudo systemctl restart docker
GPU Not Detected
# Verify nvidia-smi works
nvidia-smi
# Check runtime registration
docker info | grep -i runtime
# Test with simple container
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
CDI Method (Alternative)
# Generate CDI spec
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Use in compose
services:
app:
devices:
- nvidia.com/gpu=all
Configuration Patterns
daemon.json Structure
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
Testing GPU Access
# Test with Tdarr node image
docker run --rm --gpus all ghcr.io/haveagitgat/tdarr_node:latest nvidia-smi
# Expected output: GPU information table
Fallback Strategies
- Start with CPU-only configuration
- Verify container functionality first
- Add GPU support incrementally
- Keep Intel/AMD GPU fallback enabled