All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
110 lines
2.7 KiB
Markdown
110 lines
2.7 KiB
Markdown
---
|
|
title: "NVIDIA Container Toolkit Setup"
|
|
description: "Installation and troubleshooting reference for nvidia-container-toolkit on Fedora/DNF and Ubuntu/APT, covering daemon.json configuration, CDI method, and GPU detection issues."
|
|
type: reference
|
|
domain: docker
|
|
tags: [nvidia, container-toolkit, gpu, docker, fedora, ubuntu, installation, daemon-json]
|
|
---
|
|
|
|
# NVIDIA Container Toolkit Troubleshooting
|
|
|
|
## Installation by Distribution
|
|
|
|
### Fedora/Nobara (DNF)
|
|
```bash
|
|
# Remove conflicting packages
|
|
sudo dnf remove golang-github-nvidia-container-toolkit
|
|
|
|
# Add official repository
|
|
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
|
|
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
|
|
|
|
# Install toolkit
|
|
sudo dnf install -y nvidia-container-toolkit
|
|
|
|
# Configure Docker
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
```
|
|
|
|
### Ubuntu/Debian (APT)
|
|
```bash
|
|
# Add repository
|
|
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
|
|
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
|
|
|
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] \
|
|
https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | \
|
|
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
|
|
|
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
```
|
|
|
|
## Common Issues
|
|
|
|
### Docker Service Won't Start
|
|
```bash
|
|
# Check daemon logs
|
|
sudo journalctl -xeu docker.service
|
|
|
|
# Common fixes:
|
|
sudo systemctl stop docker.socket
|
|
sudo systemctl start docker.socket
|
|
sudo systemctl start docker
|
|
|
|
# Or reset configuration
|
|
sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.backup
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
### GPU Not Detected
|
|
```bash
|
|
# Verify nvidia-smi works
|
|
nvidia-smi
|
|
|
|
# Check runtime registration
|
|
docker info | grep -i runtime
|
|
|
|
# Test with simple container
|
|
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
|
|
```
|
|
|
|
### CDI Method (Alternative)
|
|
```bash
|
|
# Generate CDI spec
|
|
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
|
|
|
# Use in compose
|
|
services:
|
|
app:
|
|
devices:
|
|
- nvidia.com/gpu=all
|
|
```
|
|
|
|
## Configuration Patterns
|
|
|
|
### daemon.json Structure
|
|
```json
|
|
{
|
|
"runtimes": {
|
|
"nvidia": {
|
|
"args": [],
|
|
"path": "nvidia-container-runtime"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Testing GPU Access
|
|
```bash
|
|
# Test with Tdarr node image
|
|
docker run --rm --gpus all ghcr.io/haveagitgat/tdarr_node:latest nvidia-smi
|
|
|
|
# Expected output: GPU information table
|
|
```
|
|
|
|
## Fallback Strategies
|
|
1. Start with CPU-only configuration
|
|
2. Verify container functionality first
|
|
3. Add GPU support incrementally
|
|
4. Keep Intel/AMD GPU fallback enabled |