claude-home/docker/examples/docker-iptables-troubleshooting-session.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

8.1 KiB

Docker iptables/nftables Backend Troubleshooting Session

Session Context

  • Date: August 8, 2025
  • System: Nobara PC (Fedora-based gaming distro)
  • User: cal
  • Working Directory: /mnt/NV2/Development/claude-home
  • Goal: Get Docker working to run Tdarr Node container

System Information

# OS Details
uname -a
# Linux nobara-pc 6.15.5-200.nobara.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Jul  6 11:56:20 UTC 2025 x86_64 GNU/Linux

# Hardware
# AMD Ryzen 7 7800X3D 8-Core Processor
# 62GB RAM  
# NVIDIA GeForce RTX 4080 SUPER

# Distribution
# Nobara (Fedora 42-based)

Problem Summary

Docker daemon fails to start with persistent error:

failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to register "bridge" driver: failed to create NAT chain DOCKER: COMMAND_FAILED: INVALID_IPV: 'ipv4' is not a valid backend or is unavailable

Root Cause Analysis

Initial Discovery

  1. Missing iptables: Docker couldn't find iptables command in PATH
  2. Backend conflict: System using nftables but Docker expects iptables-legacy
  3. Package inconsistency: iptables-nft package installed but binary missing initially

Key Findings

  • dnf list installed | grep -i iptables initially returned nothing
  • firewalld and nftables services were both inactive
  • iptables-nft package was installed but /usr/bin/iptables didn't exist
  • After reinstall, iptables worked but used nftables backend
  • NAT table incompatible: iptables v1.8.11 (nf_tables): table 'nat' is incompatible, use 'nft' tool.

Troubleshooting Steps Performed

Step 1: Package Investigation

# Check installed iptables packages
dnf list installed | grep -i iptables
# Result: No matching packages (surprising!)

# Check service status
systemctl status nftables    # inactive (dead)
firewall-cmd --get-backend-type  # firewalld not running

# Check if iptables binary exists
which iptables  # not found
/usr/bin/iptables --version  # No such file or directory

Step 2: Package Reinstallation

# Reinstall iptables-nft package
sudo dnf reinstall -y iptables-nft

# Verify installation
rpm -ql iptables-nft | grep bin
# Shows /usr/bin/iptables should exist

# Test after reinstall
iptables --version
# Result: iptables v1.8.11 (nf_tables) - SUCCESS!

Step 3: Backend Compatibility Testing

# Test NAT table access
sudo iptables -t nat -L
# Error: iptables v1.8.11 (nf_tables): table `nat' is incompatible, use 'nft' tool.

Step 4: Legacy Backend Installation

# Install iptables-legacy
sudo dnf install -y iptables-legacy iptables-legacy-libs

# Set up alternatives system
sudo alternatives --install /usr/bin/iptables iptables /usr/bin/iptables-legacy 10
sudo alternatives --install /usr/bin/ip6tables ip6tables /usr/bin/ip6tables-legacy 10

# Test NAT table with legacy backend
sudo iptables -t nat -L
# SUCCESS: Shows empty NAT chains

Step 5: Docker Restart Attempts

# Remove NVIDIA daemon.json config (potential conflict)
sudo rm -f /etc/docker/daemon.json

# Load NAT kernel module explicitly
sudo modprobe iptable_nat

# Try starting firewalld (in case Docker needs it)
sudo systemctl enable --now firewalld

# Multiple restart attempts
sudo systemctl start docker
# ALL FAILED with same NAT chain error

Current State

  • iptables-legacy installed and configured
  • NAT table accessible via iptables -t nat -L
  • All required kernel modules should be available
  • Docker still fails with NAT chain creation error
  • Same error persists despite backend switch

Analysis of Persistent Issue

Potential Causes

  1. Kernel State Contamination: nftables rules/chains may still be active in kernel memory
  2. Module Loading Order: iptables vs nftables modules loaded in conflicting order
  3. Docker Caching: Docker may be caching the old backend detection
  4. Firewall Integration: Docker + firewalld interaction on Fedora/Nobara
  5. System-Level Backend Selection: Some system-wide iptables backend lock

Evidence Supporting Kernel State Theory

  • Error message is identical across all restart attempts
  • iptables command works fine manually
  • NAT table shows properly but Docker can't create chains
  • Issue persists despite configuration changes

Next Session Action Plan

Immediate Steps After System Reboot

  1. Verify Backend Status:

    iptables --version  # Should show legacy
    sudo iptables -t nat -L  # Should show clean NAT table
    
  2. Check Kernel Modules:

    lsmod | grep -E "(iptable|nf_|ip_tables)"
    modprobe -l | grep -E "(iptable|nf_table)"
    
  3. Test Docker Start:

    sudo systemctl start docker
    docker --version
    

If Issue Persists After Reboot

Alternative Approach 1: Docker Configuration Override

# Create daemon.json to disable iptables management
sudo mkdir -p /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "iptables": false,
  "bridge": "none"
}
EOF

sudo systemctl start docker

Alternative Approach 2: Podman as Docker Alternative

# Install podman as Docker drop-in replacement
sudo dnf install -y podman podman-docker

# Test with Tdarr container
podman run --rm ghcr.io/haveagitgat/tdarr_node:latest --help

Alternative Approach 3: Docker Desktop

# Consider Docker Desktop for Linux (handles networking differently)
# May bypass system iptables issues entirely

Alternative Approach 4: Deep System Cleanup

# Nuclear option: Remove all networking packages and reinstall
sudo dnf remove -y iptables* nftables firewalld
sudo dnf install -y iptables-legacy iptables-nft firewalld
sudo dnf reinstall -y docker-ce

Diagnostic Commands for Next Session

# Full network state capture
ip addr show
ip route show
sudo iptables-save > /tmp/iptables-state.txt
sudo nft list ruleset > /tmp/nft-state.txt

# Docker troubleshooting
sudo dockerd --debug --log-level=debug > /tmp/docker-debug.log 2>&1 &
# Kill after 30 seconds and examine log

# System journal deep dive
journalctl -u docker.service --since="1 hour ago" -o verbose > /tmp/docker-journal.log

Known Working Configuration Target

Expected Working State

  • iptables: Legacy backend active
  • Docker: Running with NAT chain creation successful
  • Network: Docker bridge network functional
  • Containers: Can start and access network

Tdarr Node Test Command

cd ~/docker/tdarr-node
# Update IP in compose file first:
# serverIP=<TDARR_SERVER_IP>
docker-compose -f tdarr-node-basic.yml up -d
  • /patterns/docker/gpu-acceleration.md - GPU troubleshooting patterns
  • /reference/docker/nvidia-troubleshooting.md - NVIDIA container toolkit
  • /examples/docker/tdarr-node-local/ - Working configurations

System Context Notes

  • This is a gaming-focused Nobara distribution
  • May have different default networking than standard Fedora
  • NVIDIA drivers already working (nvidia-smi functional)
  • System has been used for other Docker containers successfully in past
  • Recent NVIDIA container toolkit installation may have triggered the issue

Success Criteria for Next Session

  1. Docker service starts without errors
  2. docker ps command works
  3. Simple container can run: docker run --rm hello-world
  4. Tdarr node container can start (even if can't connect to server yet)
  5. Network connectivity from containers works

Escalation Options

If standard troubleshooting fails:

  1. Nobara Community: Check Nobara Discord/forums for similar issues
  2. Docker Desktop: Use different Docker implementation
  3. Podman Migration: Switch to podman as Docker replacement
  4. System Reinstall: Fresh OS install (nuclear option)
  5. Container Alternatives: LXC/systemd containers instead of Docker

Files to Check Next Session

  • /etc/docker/daemon.json - Docker configuration
  • /var/log/docker.log - Docker service logs
  • ~/.docker/config.json - User Docker config
  • /proc/sys/net/ipv4/ip_forward - IP forwarding enabled
  • /etc/systemd/system/docker.service.d/ - Service overrides

End of troubleshooting session log