claude-home/docker/examples/docker-iptables-troubleshooting-session.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

262 lines
8.1 KiB
Markdown

# Docker iptables/nftables Backend Troubleshooting Session
## Session Context
- **Date**: August 8, 2025
- **System**: Nobara PC (Fedora-based gaming distro)
- **User**: cal
- **Working Directory**: `/mnt/NV2/Development/claude-home`
- **Goal**: Get Docker working to run Tdarr Node container
## System Information
```bash
# OS Details
uname -a
# Linux nobara-pc 6.15.5-200.nobara.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Jul 6 11:56:20 UTC 2025 x86_64 GNU/Linux
# Hardware
# AMD Ryzen 7 7800X3D 8-Core Processor
# 62GB RAM
# NVIDIA GeForce RTX 4080 SUPER
# Distribution
# Nobara (Fedora 42-based)
```
## Problem Summary
Docker daemon fails to start with persistent error:
```
failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: COMMAND_FAILED: INVALID_IPV: 'ipv4' is not a valid backend or is unavailable
```
## Root Cause Analysis
### Initial Discovery
1. **Missing iptables**: Docker couldn't find `iptables` command in PATH
2. **Backend conflict**: System using nftables but Docker expects iptables-legacy
3. **Package inconsistency**: `iptables-nft` package installed but binary missing initially
### Key Findings
- `dnf list installed | grep -i iptables` initially returned nothing
- `firewalld` and `nftables` services were both inactive
- `iptables-nft` package was installed but `/usr/bin/iptables` didn't exist
- After reinstall, iptables worked but used nftables backend
- NAT table incompatible: `iptables v1.8.11 (nf_tables): table 'nat' is incompatible, use 'nft' tool.`
## Troubleshooting Steps Performed
### Step 1: Package Investigation
```bash
# Check installed iptables packages
dnf list installed | grep -i iptables
# Result: No matching packages (surprising!)
# Check service status
systemctl status nftables # inactive (dead)
firewall-cmd --get-backend-type # firewalld not running
# Check if iptables binary exists
which iptables # not found
/usr/bin/iptables --version # No such file or directory
```
### Step 2: Package Reinstallation
```bash
# Reinstall iptables-nft package
sudo dnf reinstall -y iptables-nft
# Verify installation
rpm -ql iptables-nft | grep bin
# Shows /usr/bin/iptables should exist
# Test after reinstall
iptables --version
# Result: iptables v1.8.11 (nf_tables) - SUCCESS!
```
### Step 3: Backend Compatibility Testing
```bash
# Test NAT table access
sudo iptables -t nat -L
# Error: iptables v1.8.11 (nf_tables): table `nat' is incompatible, use 'nft' tool.
```
### Step 4: Legacy Backend Installation
```bash
# Install iptables-legacy
sudo dnf install -y iptables-legacy iptables-legacy-libs
# Set up alternatives system
sudo alternatives --install /usr/bin/iptables iptables /usr/bin/iptables-legacy 10
sudo alternatives --install /usr/bin/ip6tables ip6tables /usr/bin/ip6tables-legacy 10
# Test NAT table with legacy backend
sudo iptables -t nat -L
# SUCCESS: Shows empty NAT chains
```
### Step 5: Docker Restart Attempts
```bash
# Remove NVIDIA daemon.json config (potential conflict)
sudo rm -f /etc/docker/daemon.json
# Load NAT kernel module explicitly
sudo modprobe iptable_nat
# Try starting firewalld (in case Docker needs it)
sudo systemctl enable --now firewalld
# Multiple restart attempts
sudo systemctl start docker
# ALL FAILED with same NAT chain error
```
## Current State
- ✅ iptables-legacy installed and configured
- ✅ NAT table accessible via `iptables -t nat -L`
- ✅ All required kernel modules should be available
- ❌ Docker still fails with NAT chain creation error
- ❌ Same error persists despite backend switch
## Analysis of Persistent Issue
### Potential Causes
1. **Kernel State Contamination**: nftables rules/chains may still be active in kernel memory
2. **Module Loading Order**: iptables vs nftables modules loaded in conflicting order
3. **Docker Caching**: Docker may be caching the old backend detection
4. **Firewall Integration**: Docker + firewalld interaction on Fedora/Nobara
5. **System-Level Backend Selection**: Some system-wide iptables backend lock
### Evidence Supporting Kernel State Theory
- Error message is identical across all restart attempts
- iptables command works fine manually
- NAT table shows properly but Docker can't create chains
- Issue persists despite configuration changes
## Next Session Action Plan
### Immediate Steps After System Reboot
1. **Verify Backend Status**:
```bash
iptables --version # Should show legacy
sudo iptables -t nat -L # Should show clean NAT table
```
2. **Check Kernel Modules**:
```bash
lsmod | grep -E "(iptable|nf_|ip_tables)"
modprobe -l | grep -E "(iptable|nf_table)"
```
3. **Test Docker Start**:
```bash
sudo systemctl start docker
docker --version
```
### If Issue Persists After Reboot
#### Alternative Approach 1: Docker Configuration Override
```bash
# Create daemon.json to disable iptables management
sudo mkdir -p /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"iptables": false,
"bridge": "none"
}
EOF
sudo systemctl start docker
```
#### Alternative Approach 2: Podman as Docker Alternative
```bash
# Install podman as Docker drop-in replacement
sudo dnf install -y podman podman-docker
# Test with Tdarr container
podman run --rm ghcr.io/haveagitgat/tdarr_node:latest --help
```
#### Alternative Approach 3: Docker Desktop
```bash
# Consider Docker Desktop for Linux (handles networking differently)
# May bypass system iptables issues entirely
```
#### Alternative Approach 4: Deep System Cleanup
```bash
# Nuclear option: Remove all networking packages and reinstall
sudo dnf remove -y iptables* nftables firewalld
sudo dnf install -y iptables-legacy iptables-nft firewalld
sudo dnf reinstall -y docker-ce
```
### Diagnostic Commands for Next Session
```bash
# Full network state capture
ip addr show
ip route show
sudo iptables-save > /tmp/iptables-state.txt
sudo nft list ruleset > /tmp/nft-state.txt
# Docker troubleshooting
sudo dockerd --debug --log-level=debug > /tmp/docker-debug.log 2>&1 &
# Kill after 30 seconds and examine log
# System journal deep dive
journalctl -u docker.service --since="1 hour ago" -o verbose > /tmp/docker-journal.log
```
## Known Working Configuration Target
### Expected Working State
- **iptables**: Legacy backend active
- **Docker**: Running with NAT chain creation successful
- **Network**: Docker bridge network functional
- **Containers**: Can start and access network
### Tdarr Node Test Command
```bash
cd ~/docker/tdarr-node
# Update IP in compose file first:
# serverIP=<TDARR_SERVER_IP>
docker-compose -f tdarr-node-basic.yml up -d
```
## Related Documentation Created
- `/patterns/docker/gpu-acceleration.md` - GPU troubleshooting patterns
- `/reference/docker/nvidia-troubleshooting.md` - NVIDIA container toolkit
- `/examples/docker/tdarr-node-local/` - Working configurations
## System Context Notes
- This is a gaming-focused Nobara distribution
- May have different default networking than standard Fedora
- NVIDIA drivers already working (nvidia-smi functional)
- System has been used for other Docker containers successfully in past
- Recent NVIDIA container toolkit installation may have triggered the issue
## Success Criteria for Next Session
1. ✅ Docker service starts without errors
2.`docker ps` command works
3. ✅ Simple container can run: `docker run --rm hello-world`
4. ✅ Tdarr node container can start (even if can't connect to server yet)
5. ✅ Network connectivity from containers works
## Escalation Options
If standard troubleshooting fails:
1. **Nobara Community**: Check Nobara Discord/forums for similar issues
2. **Docker Desktop**: Use different Docker implementation
3. **Podman Migration**: Switch to podman as Docker replacement
4. **System Reinstall**: Fresh OS install (nuclear option)
5. **Container Alternatives**: LXC/systemd containers instead of Docker
## Files to Check Next Session
- `/etc/docker/daemon.json` - Docker configuration
- `/var/log/docker.log` - Docker service logs
- `~/.docker/config.json` - User Docker config
- `/proc/sys/net/ipv4/ip_forward` - IP forwarding enabled
- `/etc/systemd/system/docker.service.d/` - Service overrides
---
*End of troubleshooting session log*