# NVIDIA Container Toolkit Troubleshooting ## Installation by Distribution ### Fedora/Nobara (DNF) ```bash # Remove conflicting packages sudo dnf remove golang-github-nvidia-container-toolkit # Add official repository curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo # Install toolkit sudo dnf install -y nvidia-container-toolkit # Configure Docker sudo nvidia-ctk runtime configure --runtime=docker ``` ### Ubuntu/Debian (APT) ```bash # Add repository curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] \ https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker ``` ## Common Issues ### Docker Service Won't Start ```bash # Check daemon logs sudo journalctl -xeu docker.service # Common fixes: sudo systemctl stop docker.socket sudo systemctl start docker.socket sudo systemctl start docker # Or reset configuration sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.backup sudo systemctl restart docker ``` ### GPU Not Detected ```bash # Verify nvidia-smi works nvidia-smi # Check runtime registration docker info | grep -i runtime # Test with simple container docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi ``` ### CDI Method (Alternative) ```bash # Generate CDI spec sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml # Use in compose services: app: devices: - nvidia.com/gpu=all ``` ## Configuration Patterns ### daemon.json Structure ```json { "runtimes": { "nvidia": { "args": [], "path": "nvidia-container-runtime" } } } ``` ### Testing GPU Access ```bash # Test with Tdarr node image docker run --rm --gpus all ghcr.io/haveagitgat/tdarr_node:latest nvidia-smi # Expected output: GPU information table ``` ## Fallback Strategies 1. Start with CPU-only configuration 2. Verify container functionality first 3. Add GPU support incrementally 4. Keep Intel/AMD GPU fallback enabled