Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
10 KiB
Docker Container Troubleshooting Guide
Container Startup Issues
Container Won't Start
Check container logs first:
# Docker
docker logs <container_name>
docker logs --tail 50 -f <container_name>
# Podman
podman logs <container_name>
podman logs --tail 50 -f <container_name>
Common Startup Failures
Port Conflicts
Symptoms: bind: address already in use error
Solution:
# Find conflicting process
sudo netstat -tulpn | grep <port>
docker ps | grep <port>
# Change port mapping
docker run -p 8081:8080 myapp # Use different host port
Permission Errors
Symptoms: permission denied when accessing files/volumes
Solutions:
# Check file ownership
ls -la /host/volume/path
# Fix ownership (match container user)
sudo chown -R 1000:1000 /host/volume/path
# Use correct UID/GID in container
docker run -e PUID=1000 -e PGID=1000 myapp
Missing Environment Variables
Symptoms: Application fails with configuration errors Diagnostic:
# Check container environment
docker exec -it <container> env
docker exec -it <container> printenv
# Verify required variables are set
docker inspect <container> | grep -A 20 "Env"
Resource Constraints
Symptoms: Container killed or OOM errors Solutions:
# Check resource usage
docker stats <container>
# Increase memory limit
docker run -m 4g myapp
# Check system resources
free -h
df -h
Debug Running Containers
# Access container shell
docker exec -it <container> /bin/bash
docker exec -it <container> /bin/sh # if bash not available
# Check container processes
docker exec <container> ps aux
# Check container filesystem
docker exec <container> ls -la /app
Build Issues
Build Failures
Clear build cache when encountering issues:
# Docker
docker system prune -a
docker builder prune
# Podman
podman system prune -a
podman image prune -a
Verbose Build Output
# Docker
docker build --progress=plain --no-cache .
# Podman
podman build --layers=false .
Common Build Problems
COPY/ADD Errors
Issue: Files not found during build Solutions:
# Check .dockerignore file
# Verify file paths relative to build context
COPY ./src /app/src # ✅ Correct
COPY /absolute/path /app # ❌ Wrong - no absolute paths
Package Installation Failures
Issue: apt/yum/dnf package installation fails Solutions:
# Update package lists first
RUN apt-get update && apt-get install -y package-name
# Combine RUN commands to reduce layers
RUN apt-get update && \
apt-get install -y package1 package2 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
Network Issues During Build
Issue: Cannot reach package repositories Solutions:
# Check DNS resolution
docker build --network host .
# Use custom DNS
docker build --dns 8.8.8.8 .
GPU Container Issues
NVIDIA GPU Support Problems
Docker Desktop vs Podman on Fedora/Nobara
Issue: Docker Desktop has GPU compatibility issues on Fedora-based systems Symptoms:
CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detectedunknown or invalid runtime name: nvidia- Device nodes exist but CUDA fails to initialize
Solution: Use Podman instead of Docker on Fedora systems
# Verify host GPU works
nvidia-smi
# Test with Podman (recommended)
podman run --rm --device nvidia.com/gpu=all ubuntu:20.04 nvidia-smi
# Test with Docker (may fail on Fedora)
docker run --rm --gpus all ubuntu:20.04 nvidia-smi
GPU Container Configuration
Working Podman GPU template:
podman run -d --name gpu-container \
--device nvidia.com/gpu=all \
--restart unless-stopped \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
myapp:latest
Working Docker GPU template:
docker run -d --name gpu-container \
--gpus all \
--restart unless-stopped \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
myapp:latest
GPU Troubleshooting Steps
-
Verify Host GPU Access:
nvidia-smi # Should show GPU info lsmod | grep nvidia # Should show nvidia modules ls -la /dev/nvidia* # Should show device files -
Check NVIDIA Container Toolkit:
rpm -qa | grep nvidia-container-toolkit # Fedora/RHEL dpkg -l | grep nvidia-container-toolkit # Ubuntu/Debian nvidia-ctk --version -
Test GPU in Container:
# Should show GPU information podman exec gpu-container nvidia-smi # Test CUDA functionality podman exec gpu-container nvidia-ml-py
Platform-Specific GPU Notes
Fedora/Nobara/RHEL:
- ✅ Podman: Works out-of-the-box with GPU support
- ❌ Docker Desktop: Known GPU integration issues
- Solution: Use Podman for GPU workloads
Ubuntu/Debian:
- ✅ Docker: Generally works well with proper NVIDIA toolkit setup
- ✅ Podman: Also works well
- Solution: Either runtime typically works
Performance Issues
Resource Monitoring
Real-time resource usage:
# Overall container stats
docker stats
podman stats
# Inside container analysis
docker exec <container> top
docker exec <container> free -h
docker exec <container> df -h
# Network usage
docker exec <container> netstat -i
Image Size Optimization
Analyze image layers:
# Check image sizes
docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
# Analyze layer history
docker history <image>
# Find large files in container
docker exec <container> du -sh /* | sort -hr
Optimization strategies:
# Use multi-stage builds
FROM node:18 AS builder
# ... build steps ...
FROM node:18-alpine AS production
COPY --from=builder /app/dist /app
# Smaller final image
# Combine RUN commands
RUN apt-get update && \
apt-get install -y package && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Use .dockerignore
# .dockerignore
node_modules
.git
*.log
Storage Performance Issues
Slow volume performance:
# Test volume I/O performance
docker exec <container> dd if=/dev/zero of=/volume/test bs=1M count=1000
# Check volume mount options
docker inspect <container> | grep -A 10 "Mounts"
# Consider using tmpfs for temporary data
docker run --tmpfs /tmp myapp
Network Debugging
Network Connectivity Issues
Inspect network configuration:
# List networks
docker network ls
podman network ls
# Inspect specific network
docker network inspect <network_name>
# Check container networking
docker exec <container> ip addr show
docker exec <container> ip route show
Service Discovery Problems
Test connectivity between containers:
# Test by container name (same network)
docker exec container1 ping container2
# Test by IP address
docker exec container1 ping 172.17.0.3
# Check DNS resolution
docker exec container1 nslookup container2
Port Binding Issues
Verify port mappings:
# Check exposed ports
docker port <container>
# Test external connectivity
curl localhost:8080
# Check if port is bound to all interfaces
netstat -tulpn | grep :8080
Emergency Recovery
Complete Container Reset
Remove all containers and start fresh:
# Stop all containers
docker stop $(docker ps -q)
podman stop --all
# Remove all containers
docker container prune -f
podman container prune -f
# Remove all images
docker image prune -a -f
podman image prune -a -f
# Remove all volumes (CAUTION: data loss)
docker volume prune -f
podman volume prune -f
# Complete system cleanup
docker system prune -a --volumes -f
podman system prune -a --volumes -f
Container Recovery
Recover from corrupted container:
# Create backup of container data
docker cp <container>:/important/data ./backup/
# Export container filesystem
docker export <container> > container-backup.tar
# Import and restart
docker import container-backup.tar new-image:latest
docker run -d --name new-container new-image:latest
Data Recovery
Recover data from volumes:
# List volumes
docker volume ls
# Inspect volume location
docker volume inspect <volume_name>
# Access volume data directly
sudo ls -la /var/lib/docker/volumes/<volume_name>/_data
# Mount volume to temporary container
docker run --rm -v <volume_name>:/data alpine ls -la /data
Health Check Issues
Container Health Checks
Implement health checks:
# Dockerfile health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
Debug health check failures:
# Check health status
docker inspect <container> | grep -A 10 Health
# Manual health check test
docker exec <container> curl -f http://localhost:3000/health
# Check health check logs
docker events --filter container=<container>
Log Analysis
Log Management
View and manage container logs:
# View recent logs
docker logs --tail 100 <container>
# Follow logs in real-time
docker logs -f <container>
# Logs with timestamps
docker logs -t <container>
# Search logs for errors
docker logs <container> 2>&1 | grep ERROR
Log Rotation Issues
Configure log rotation to prevent disk filling:
# Run with log size limits
docker run --log-opt max-size=10m --log-opt max-file=3 myapp
# Check log file sizes
sudo du -sh /var/lib/docker/containers/*/
Platform-Specific Issues
Fedora/Nobara/RHEL Systems
- GPU Support: Use Podman instead of Docker Desktop
- SELinux: May require container contexts (
-Zflag) - Firewall: Configure firewalld for container networking
Ubuntu/Debian Systems
- AppArmor: May restrict container operations
- Snap Docker: May have permission issues vs native package
General Linux Issues
- cgroups v2: Some older containers need cgroups v1
- User namespaces: May cause UID/GID mapping issues
- systemd: Integration differences between Docker/Podman
Prevention Best Practices
- Resource Limits: Always set memory and CPU limits
- Health Checks: Implement application health monitoring
- Log Rotation: Configure to prevent disk space issues
- Security Scanning: Regular vulnerability scans
- Backup Strategy: Regular data and configuration backups
- Testing: Test containers in staging before production
- Documentation: Document container configurations and dependencies
This troubleshooting guide covers the most common Docker and Podman container issues encountered in home lab and production environments.