Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
102 lines
2.3 KiB
Markdown
102 lines
2.3 KiB
Markdown
# NVIDIA Container Toolkit Troubleshooting
|
|
|
|
## Installation by Distribution
|
|
|
|
### Fedora/Nobara (DNF)
|
|
```bash
|
|
# Remove conflicting packages
|
|
sudo dnf remove golang-github-nvidia-container-toolkit
|
|
|
|
# Add official repository
|
|
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
|
|
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
|
|
|
|
# Install toolkit
|
|
sudo dnf install -y nvidia-container-toolkit
|
|
|
|
# Configure Docker
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
```
|
|
|
|
### Ubuntu/Debian (APT)
|
|
```bash
|
|
# Add repository
|
|
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
|
|
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
|
|
|
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] \
|
|
https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | \
|
|
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
|
|
|
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
```
|
|
|
|
## Common Issues
|
|
|
|
### Docker Service Won't Start
|
|
```bash
|
|
# Check daemon logs
|
|
sudo journalctl -xeu docker.service
|
|
|
|
# Common fixes:
|
|
sudo systemctl stop docker.socket
|
|
sudo systemctl start docker.socket
|
|
sudo systemctl start docker
|
|
|
|
# Or reset configuration
|
|
sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.backup
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
### GPU Not Detected
|
|
```bash
|
|
# Verify nvidia-smi works
|
|
nvidia-smi
|
|
|
|
# Check runtime registration
|
|
docker info | grep -i runtime
|
|
|
|
# Test with simple container
|
|
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
|
|
```
|
|
|
|
### CDI Method (Alternative)
|
|
```bash
|
|
# Generate CDI spec
|
|
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
|
|
|
# Use in compose
|
|
services:
|
|
app:
|
|
devices:
|
|
- nvidia.com/gpu=all
|
|
```
|
|
|
|
## Configuration Patterns
|
|
|
|
### daemon.json Structure
|
|
```json
|
|
{
|
|
"runtimes": {
|
|
"nvidia": {
|
|
"args": [],
|
|
"path": "nvidia-container-runtime"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Testing GPU Access
|
|
```bash
|
|
# Test with Tdarr node image
|
|
docker run --rm --gpus all ghcr.io/haveagitgat/tdarr_node:latest nvidia-smi
|
|
|
|
# Expected output: GPU information table
|
|
```
|
|
|
|
## Fallback Strategies
|
|
1. Start with CPU-only configuration
|
|
2. Verify container functionality first
|
|
3. Add GPU support incrementally
|
|
4. Keep Intel/AMD GPU fallback enabled |