- Add comprehensive Tdarr troubleshooting and GPU transcoding documentation - Create /scripts directory for active operational scripts - Archive mapped node example in /examples for reference - Update CLAUDE.md with scripts directory context triggers - Add distributed transcoding patterns and NVIDIA troubleshooting guides - Enhance documentation structure with clear directory usage guidelines 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
102 lines
2.3 KiB
Markdown
102 lines
2.3 KiB
Markdown
# NVIDIA Container Toolkit Troubleshooting
|
|
|
|
## Installation by Distribution
|
|
|
|
### Fedora/Nobara (DNF)
|
|
```bash
|
|
# Remove conflicting packages
|
|
sudo dnf remove golang-github-nvidia-container-toolkit
|
|
|
|
# Add official repository
|
|
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
|
|
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
|
|
|
|
# Install toolkit
|
|
sudo dnf install -y nvidia-container-toolkit
|
|
|
|
# Configure Docker
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
```
|
|
|
|
### Ubuntu/Debian (APT)
|
|
```bash
|
|
# Add repository
|
|
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
|
|
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
|
|
|
|
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] \
|
|
https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | \
|
|
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
|
|
|
|
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
|
|
sudo nvidia-ctk runtime configure --runtime=docker
|
|
```
|
|
|
|
## Common Issues
|
|
|
|
### Docker Service Won't Start
|
|
```bash
|
|
# Check daemon logs
|
|
sudo journalctl -xeu docker.service
|
|
|
|
# Common fixes:
|
|
sudo systemctl stop docker.socket
|
|
sudo systemctl start docker.socket
|
|
sudo systemctl start docker
|
|
|
|
# Or reset configuration
|
|
sudo mv /etc/docker/daemon.json /etc/docker/daemon.json.backup
|
|
sudo systemctl restart docker
|
|
```
|
|
|
|
### GPU Not Detected
|
|
```bash
|
|
# Verify nvidia-smi works
|
|
nvidia-smi
|
|
|
|
# Check runtime registration
|
|
docker info | grep -i runtime
|
|
|
|
# Test with simple container
|
|
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi
|
|
```
|
|
|
|
### CDI Method (Alternative)
|
|
```bash
|
|
# Generate CDI spec
|
|
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
|
|
|
|
# Use in compose
|
|
services:
|
|
app:
|
|
devices:
|
|
- nvidia.com/gpu=all
|
|
```
|
|
|
|
## Configuration Patterns
|
|
|
|
### daemon.json Structure
|
|
```json
|
|
{
|
|
"runtimes": {
|
|
"nvidia": {
|
|
"args": [],
|
|
"path": "nvidia-container-runtime"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Testing GPU Access
|
|
```bash
|
|
# Test with Tdarr node image
|
|
docker run --rm --gpus all ghcr.io/haveagitgat/tdarr_node:latest nvidia-smi
|
|
|
|
# Expected output: GPU information table
|
|
```
|
|
|
|
## Fallback Strategies
|
|
1. Start with CPU-only configuration
|
|
2. Verify container functionality first
|
|
3. Add GPU support incrementally
|
|
4. Keep Intel/AMD GPU fallback enabled |