--- title: "Networking Troubleshooting Guide" description: "Comprehensive troubleshooting for SSH, DNS, reverse proxy, SSL, CIFS/NFS mounts, Pi-hole HA, iOS DNS bypass, UniFi firewall rules, and emergency recovery procedures." type: troubleshooting domain: networking tags: [ssh, dns, pihole, ssl, cifs, nfs, firewall, unifi, ios, nginx, troubleshooting] --- # Networking Infrastructure Troubleshooting Guide ## SSH Connection Issues ### SSH Authentication Failures **Symptoms**: Permission denied, connection refused, timeout **Diagnosis**: ```bash # Verbose SSH debugging ssh -vvv user@host # Test different authentication methods ssh -o PasswordAuthentication=no user@host ssh -o PubkeyAuthentication=yes user@host # Check local key files ls -la ~/.ssh/ ssh-keygen -lf ~/.ssh/homelab_rsa.pub ``` **Solutions**: ```bash # Re-deploy SSH keys ssh-copy-id -i ~/.ssh/homelab_rsa.pub user@host ssh-copy-id -i ~/.ssh/emergency_homelab_rsa.pub user@host # Fix key permissions chmod 600 ~/.ssh/homelab_rsa chmod 644 ~/.ssh/homelab_rsa.pub chmod 700 ~/.ssh # Verify remote authorized_keys ssh user@host 'chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys' ``` ### SSH Service Issues **Symptoms**: Connection refused, service not running **Diagnosis**: ```bash # Check SSH service status systemctl status sshd ss -tlnp | grep :22 # Test port connectivity nc -zv host 22 nmap -p 22 host ``` **Solutions**: ```bash # Restart SSH service sudo systemctl restart sshd sudo systemctl enable sshd # Check firewall sudo ufw status sudo ufw allow ssh # Verify SSH configuration sudo sshd -T | grep -E "(passwordauth|pubkeyauth|permitroot)" ``` ## Network Connectivity Problems ### Basic Network Troubleshooting **Symptoms**: Cannot reach hosts, timeouts, routing issues **Diagnosis**: ```bash # Basic connectivity tests ping host traceroute host mtr host # Check local network configuration ip addr show ip route show cat /etc/resolv.conf ``` **Solutions**: ```bash # Restart networking sudo systemctl restart networking sudo netplan apply # Ubuntu # Reset network interface sudo ip link set eth0 down sudo ip link set eth0 up # Check default gateway sudo ip route add default via 10.10.0.1 ``` ### DNS Resolution Issues **Symptoms**: Cannot resolve hostnames, slow resolution **Diagnosis**: ```bash # Test DNS resolution nslookup google.com dig google.com host google.com # Check DNS servers systemd-resolve --status cat /etc/resolv.conf ``` **Solutions**: ```bash # Temporary DNS fix echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf # Restart DNS services sudo systemctl restart systemd-resolved # Flush DNS cache sudo systemd-resolve --flush-caches ``` ### UniFi Firewall Blocking DNS to New Networks **Symptoms**: New network/VLAN has "no internet access" - devices connect to WiFi but cannot browse or resolve domain names. Ping to IP addresses (8.8.8.8) works, but DNS resolution fails. **Root Cause**: Firewall rules blocking traffic from DNS servers (Pi-holes in "Servers" network group) to new networks. Rules like "Servers to WiFi" or "Servers to Home" with DROP action block ALL traffic including DNS responses on port 53. **Diagnosis**: ```bash # From affected device on new network: # Test if routing works (should succeed) ping 8.8.8.8 traceroute 8.8.8.8 # Test if DNS resolution works (will fail) nslookup google.com # Test DNS servers directly (will timeout or fail) nslookup google.com 10.10.0.16 nslookup google.com 10.10.0.226 # Test public DNS (should work) nslookup google.com 8.8.8.8 # Check DHCP-assigned DNS servers # Windows: ipconfig /all | findstr DNS # Linux/macOS: cat /etc/resolv.conf ``` **If routing works but DNS fails**, the issue is firewall blocking DNS traffic, not network configuration. **Solutions**: **Step 1: Identify Blocking Rules** - In UniFi: Settings → Firewall & Security → Traffic Rules → LAN In - Look for DROP rules with: - Source: Servers (or network group containing Pi-holes) - Destination: Your new network (e.g., "Home WiFi", "Home Network") - Examples: "Servers to WiFi", "Servers to Home" **Step 2: Create DNS Allow Rules (BEFORE Drop Rules)** Create new rules positioned ABOVE the drop rules: ``` Name: Allow DNS - Servers to [Network Name] Action: Accept Rule Applied: Before Predefined Rules Type: LAN In Protocol: TCP and UDP Source: - Network/Group: Servers (or specific Pi-hole IPs: 10.10.0.16, 10.10.0.226) - Port: Any Destination: - Network: [Your new network - e.g., Home WiFi] - Port: 53 (DNS) ``` Repeat for each network that needs DNS access from servers. **Step 3: Verify Rule Order** **CRITICAL**: Firewall rules process top-to-bottom, first match wins! Correct order: ``` ✅ Allow DNS - Servers to Home Network (Accept, Port 53) ✅ Allow DNS - Servers to Home WiFi (Accept, Port 53) ❌ Servers to Home (Drop, All ports) ❌ Servers to WiFi (Drop, All ports) ``` **Step 4: Re-enable Drop Rules** Once DNS allow rules are in place and positioned correctly, re-enable the drop rules. **Verification**: ```bash # From device on new network: # DNS should work nslookup google.com # Browsing should work ping google.com # Other server traffic should still be blocked (expected) ping 10.10.0.16 # Should fail or timeout ssh 10.10.0.16 # Should be blocked ``` **Real-World Example**: New "Home WiFi" network (10.1.0.0/24, VLAN 2) - **Problem**: Devices connected but couldn't browse web - **Diagnosis**: `traceroute 8.8.8.8` worked (16ms), but `nslookup google.com` failed - **Cause**: Firewall rule "Servers to WiFi" (rule 20004) blocked Pi-hole DNS responses - **Solution**: Added "Allow DNS - Servers to Home WiFi" rule (Accept, port 53) above drop rule - **Result**: DNS resolution works, other server traffic remains properly blocked ## Reverse Proxy and Load Balancer Issues ### Nginx Configuration Problems **Symptoms**: 502 Bad Gateway, 503 Service Unavailable, SSL errors **Diagnosis**: ```bash # Check Nginx status and logs systemctl status nginx sudo tail -f /var/log/nginx/error.log sudo tail -f /var/log/nginx/access.log # Test Nginx configuration sudo nginx -t sudo nginx -T # Show full configuration ``` **Solutions**: ```bash # Reload Nginx configuration sudo nginx -s reload # Check upstream servers curl -I http://backend-server:port telnet backend-server port # Fix common configuration issues sudo nano /etc/nginx/sites-available/default # Check proxy_pass URLs, upstream definitions ``` ### SSL/TLS Certificate Issues **Symptoms**: Certificate warnings, expired certificates, connection errors **Diagnosis**: ```bash # Check certificate validity openssl s_client -connect host:443 -servername host openssl x509 -in /etc/ssl/certs/cert.pem -text -noout # Check certificate expiry openssl x509 -in /etc/ssl/certs/cert.pem -noout -dates ``` **Solutions**: ```bash # Renew Let's Encrypt certificates sudo certbot renew --dry-run sudo certbot renew --force-renewal # Generate self-signed certificate sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout /etc/ssl/private/selfsigned.key \ -out /etc/ssl/certs/selfsigned.crt ``` ### Intermittent SSL Errors (ERR_SSL_UNRECOGNIZED_NAME_ALERT) **Symptoms**: SSL errors that work sometimes but fail other times, `ERR_SSL_UNRECOGNIZED_NAME_ALERT` in browser, connection works from internal network intermittently **Root Cause**: IPv6/IPv4 DNS conflicts where public DNS returns Cloudflare IPv6 addresses while local DNS (Pi-hole) only overrides IPv4. Modern systems prefer IPv6, causing intermittent failures when IPv6 connection attempts fail. **Diagnosis**: ```bash # Check for multiple DNS records (IPv4 + IPv6) nslookup domain.example.com 10.10.0.16 dig domain.example.com @10.10.0.16 # Compare with public DNS host domain.example.com 8.8.8.8 # Test IPv6 vs IPv4 connectivity curl -6 -I https://domain.example.com # IPv6 (may fail) curl -4 -I https://domain.example.com # IPv4 (should work) # Check if system has IPv6 connectivity ip -6 addr show | grep global ``` **Example Problem**: ```bash # Local Pi-hole returns: domain.example.com → 10.10.0.16 (IPv4 internal NPM) # Public DNS also returns: domain.example.com → 2606:4700:... (Cloudflare IPv6) # System tries IPv6 first → fails # Sometimes falls back to IPv4 → works # Result: Intermittent SSL errors ``` **Solutions**: **Option 1: Add IPv6 Local DNS Override** (Recommended) ```bash # Add non-routable IPv6 address to Pi-hole custom.list ssh pihole "docker exec pihole bash -c 'echo \"fe80::1 domain.example.com\" >> /etc/pihole/custom.list'" # Restart Pi-hole DNS ssh pihole "docker exec pihole pihole restartdns" # Verify fix nslookup domain.example.com 10.10.0.16 # Should show: 10.10.0.16 (IPv4) and fe80::1 (IPv6 link-local) ``` **Option 2: Remove Cloudflare DNS Records** (If public access not needed) ```bash # In Cloudflare dashboard: # - Turn off orange cloud (proxy) for the domain # - Or delete A/AAAA records entirely # This removes Cloudflare IPs from public DNS ``` **Option 3: Disable IPv6 on Client** (Temporary testing) ```bash # Disable IPv6 temporarily to confirm diagnosis sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 # Test domain - should work consistently now # Re-enable when done testing sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0 ``` **Verification**: ```bash # After applying fix, verify consistent resolution for i in {1..10}; do echo "Test $i:" curl -I https://domain.example.com 2>&1 | grep -E "(HTTP|SSL|certificate)" sleep 1 done # All attempts should succeed consistently ``` **Real-World Example**: git.manticorum.com - **Problem**: Intermittent SSL errors from internal network (10.0.0.0/24) - **Diagnosis**: Pi-hole had IPv4 override (10.10.0.16) but public DNS returned Cloudflare IPv6 - **Solution**: Added `fe80::1 git.manticorum.com` to Pi-hole custom.list - **Result**: Consistent successful connections, always routes to internal NPM ### iOS DNS Bypass Issues (Encrypted DNS) **Symptoms**: iOS device gets 403 errors when accessing internal services, NPM logs show external public IP as source instead of local 10.x.x.x IP, even with correct Pi-hole DNS configuration **Root Cause**: iOS devices can use encrypted DNS (DNS-over-HTTPS or DNS-over-TLS) that bypasses traditional DNS servers, even when correctly configured. This causes the device to resolve to public/Cloudflare IPs instead of local overrides, routing traffic through the public internet and triggering ACL denials. **Diagnosis**: ```bash # Check NPM access logs for the service ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403" # Look for external IPs in logs instead of local 10.x.x.x: # BAD: [Client 73.36.102.55] - - 403 (external IP, blocked by ACL) # GOOD: [Client 10.0.0.207] - 200 200 (local IP, allowed) # Verify iOS device is on local network # On iOS: Settings → Wi-Fi → (i) → IP Address # Should show 10.0.0.x or 10.10.0.x # Verify Pi-hole DNS is configured # On iOS: Settings → Wi-Fi → (i) → DNS # Should show 10.10.0.16 # Test if DNS is actually being used nslookup domain.example.com 10.10.0.16 # Shows what Pi-hole returns # Then check what iOS actually resolves (if possible via network sniffer) ``` **Example Problem**: ```bash # iOS device configuration: IP Address: 10.0.0.207 (correct, on local network) DNS: 10.10.0.16 (correct, Pi-hole configured) Cellular Data: OFF # But NPM logs show: [Client 73.36.102.55] - - 403 # Coming from ISP public IP! # Why: iOS is using encrypted DNS, bypassing Pi-hole # Result: Resolves to Cloudflare IP, routes through public internet, # NPM sees external IP, ACL blocks with 403 ``` **Solutions**: **Option 1: Add Public IP to NPM Access Rules** (Quickest, recommended for mobile devices) ```bash # Find which config file contains your domain ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sh -c 'grep -l domain.example.com /data/nginx/proxy_host/*.conf'" # Example output: /data/nginx/proxy_host/19.conf # Add public IP to access rules (replace YOUR_PUBLIC_IP and config number) ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sed -i '/allow 10.10.0.0\/24;/a \ \n allow YOUR_PUBLIC_IP;' /data/nginx/proxy_host/19.conf" # Verify the change ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 cat /data/nginx/proxy_host/19.conf" | grep -A 8 "Access Rules" # Test and reload nginx ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -t" ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -s reload" ``` **Option 2: Reset iOS Network Settings** (Nuclear option, clears DNS cache/profiles) ``` iOS: Settings → General → Transfer or Reset iPhone → Reset → Reset Network Settings WARNING: This removes all saved WiFi passwords and network configurations ``` **Option 3: Check for DNS Configuration Profiles** ``` iOS: Settings → General → VPN & Device Management - Look for any DNS or Configuration Profiles - Remove any third-party DNS profiles (AdGuard, NextDNS, etc.) ``` **Option 4: Disable Private Relay and IP Tracking** (Usually already tried) ``` iOS: Settings → [Your Name] → iCloud → Private Relay → OFF iOS: Settings → Wi-Fi → (i) → Limit IP Address Tracking → OFF ``` **Option 5: Check Browser DNS Settings** (If using Brave or Firefox) ``` Brave: Settings → Brave Shields & Privacy → Use secure DNS → OFF Firefox: Settings → DNS over HTTPS → OFF ``` **Verification**: ```bash # After applying fix, check NPM logs while accessing from iOS ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log" # With Option 1 (added public IP): Should see 200 status with external IP # With Option 2-5 (fixed DNS): Should see 200 status with local 10.x.x.x IP ``` **Important Notes**: - **Option 1 is recommended for mobile devices** as iOS encrypted DNS behavior is inconsistent - Public IP workaround requires updating if ISP changes your IP (rare for residential) - Manual nginx config changes (Option 1) will be **overwritten if you edit the proxy host in NPM UI** - To make permanent, either use NPM UI to add the IP, or re-apply after UI changes - This issue can affect any iOS device (iPhone, iPad) and some Android devices with encrypted DNS **Real-World Example**: git.manticorum.com iOS Access - **Problem**: iPhone showing 403 errors, desktop working fine on same network - **iOS Config**: IP 10.0.0.207, DNS 10.10.0.16, Cellular OFF (all correct) - **NPM Logs**: iPhone requests showing as [Client 73.36.102.55] (ISP public IP) - **Diagnosis**: iOS using encrypted DNS, bypassing Pi-hole, routing through Cloudflare - **Solution**: Added `allow 73.36.102.55;` to NPM proxy_host/19.conf ACL rules - **Result**: Immediate access, user able to log in to Gitea successfully ## Network Storage Issues ### CIFS/SMB Mount Problems **Symptoms**: Mount failures, connection timeouts, permission errors **Diagnosis**: ```bash # Test SMB connectivity smbclient -L //nas-server -U username testparm # Test Samba configuration # Check mount status mount | grep cifs df -h | grep cifs ``` **Solutions**: ```bash # Remount with verbose logging sudo mount -t cifs //server/share /mnt/point -o username=user,password=pass,vers=3.0 # Fix mount options in /etc/fstab //server/share /mnt/point cifs credentials=/etc/cifs/credentials,uid=1000,gid=1000,iocharset=utf8,file_mode=0644,dir_mode=0755,cache=strict,_netdev 0 0 # Test credentials sudo cat /etc/cifs/credentials # Should contain: username=, password=, domain= ``` ### NFS Mount Issues **Symptoms**: Stale file handles, mount hangs, permission denied **Diagnosis**: ```bash # Check NFS services systemctl status nfs-client.target showmount -e nfs-server # Test NFS connectivity rpcinfo -p nfs-server ``` **Solutions**: ```bash # Restart NFS services sudo systemctl restart nfs-client.target # Remount NFS shares sudo umount /mnt/nfs-share sudo mount -t nfs server:/path /mnt/nfs-share # Fix stale file handles sudo umount -f /mnt/nfs-share sudo mount /mnt/nfs-share ``` ## Firewall and Security Issues ### Port Access Problems **Symptoms**: Connection refused, filtered ports, blocked services **Diagnosis**: ```bash # Check firewall status sudo ufw status verbose sudo iptables -L -n -v # Test port accessibility nc -zv host port nmap -p port host ``` **Solutions**: ```bash # Open required ports sudo ufw allow ssh sudo ufw allow 80/tcp sudo ufw allow 443/tcp sudo ufw allow from 10.10.0.0/24 # Reset firewall if needed sudo ufw --force reset sudo ufw enable ``` ### Network Security Issues **Symptoms**: Unauthorized access, suspicious traffic, security alerts **Diagnosis**: ```bash # Check active connections ss -tuln netstat -tuln # Review logs for security events sudo tail -f /var/log/auth.log sudo tail -f /var/log/syslog | grep -i security ``` **Solutions**: ```bash # Block suspicious IPs sudo ufw deny from suspicious-ip # Update SSH security sudo nano /etc/ssh/sshd_config # Set: PasswordAuthentication no, PermitRootLogin no sudo systemctl restart sshd ``` ## Pi-hole High Availability Troubleshooting ### Pi-hole Not Responding to DNS Queries **Symptoms**: DNS resolution failures, clients cannot resolve domains, Pi-hole web UI inaccessible **Diagnosis**: ```bash # Test DNS response from both Pi-holes dig @10.10.0.16 google.com dig @10.10.0.226 google.com # Check Pi-hole container status ssh npm-pihole "docker ps | grep pihole" ssh ubuntu-manticore "docker ps | grep pihole" # Check Pi-hole logs ssh npm-pihole "docker logs pihole --tail 50" ssh ubuntu-manticore "docker logs pihole --tail 50" # Test port 53 is listening ssh ubuntu-manticore "netstat -tulpn | grep :53" ssh ubuntu-manticore "ss -tulpn | grep :53" ``` **Solutions**: ```bash # Restart Pi-hole containers ssh npm-pihole "docker restart pihole" ssh ubuntu-manticore "cd ~/docker/pihole && docker compose restart" # Check for port conflicts ssh ubuntu-manticore "lsof -i :53" # If systemd-resolved is conflicting, disable it ssh ubuntu-manticore "sudo systemctl stop systemd-resolved" ssh ubuntu-manticore "sudo systemctl disable systemd-resolved" # Rebuild Pi-hole container ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d" ``` ### DNS Failover Not Working **Symptoms**: DNS stops working when primary Pi-hole fails, clients not using secondary DNS **Diagnosis**: ```bash # Check UniFi DHCP DNS configuration # Via UniFi UI: Settings → Networks → LAN → DHCP # DNS Server 1: 10.10.0.16 # DNS Server 2: 10.10.0.226 # Check client DNS configuration # Windows: ipconfig /all | findstr /i "DNS" # Linux/macOS: cat /etc/resolv.conf # Check if secondary Pi-hole is reachable ping -c 4 10.10.0.226 dig @10.10.0.226 google.com # Test failover manually ssh npm-pihole "docker stop pihole" dig google.com # Should still work via secondary ssh npm-pihole "docker start pihole" ``` **Solutions**: ```bash # Force DHCP lease renewal to get updated DNS servers # Windows: ipconfig /release && ipconfig /renew # Linux: sudo dhclient -r && sudo dhclient # macOS/iOS: # Disconnect and reconnect to WiFi # Verify UniFi DHCP settings are correct # Both DNS servers must be configured in UniFi controller # Check client respects both DNS servers # Some clients may cache failed DNS responses # Flush DNS cache: # Windows: ipconfig /flushdns # macOS: sudo dscacheutil -flushcache # Linux: sudo systemd-resolve --flush-caches ``` ### Orbital Sync Not Syncing **Symptoms**: Blocklists/whitelists differ between Pi-holes, custom DNS entries missing on secondary **Diagnosis**: ```bash # Check Orbital Sync container status ssh ubuntu-manticore "docker ps | grep orbital-sync" # Check Orbital Sync logs ssh ubuntu-manticore "docker logs orbital-sync --tail 100" # Look for sync errors in logs ssh ubuntu-manticore "docker logs orbital-sync 2>&1 | grep -i error" # Verify API tokens are correct ssh ubuntu-manticore "cat ~/docker/orbital-sync/.env" # Test API access manually ssh npm-pihole "docker exec pihole pihole -a -p" # Get API token curl -H "Authorization: Token YOUR_TOKEN" http://10.10.0.16/admin/api.php?status # Compare blocklist counts between Pi-holes ssh npm-pihole "docker exec pihole pihole -g -l" ssh ubuntu-manticore "docker exec pihole pihole -g -l" ``` **Solutions**: ```bash # Regenerate API tokens # Primary Pi-hole: http://10.10.0.16/admin → Settings → API → Generate New Token # Secondary Pi-hole: http://10.10.0.226:8053/admin → Settings → API → Generate New Token # Update Orbital Sync .env file ssh ubuntu-manticore "nano ~/docker/orbital-sync/.env" # Update PRIMARY_HOST_PASSWORD and SECONDARY_HOST_PASSWORD # Restart Orbital Sync ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose restart" # Force immediate sync by restarting ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose down && docker compose up -d" # Monitor sync in real-time ssh ubuntu-manticore "docker logs orbital-sync -f" # If all else fails, manually sync via Teleporter # Primary: Settings → Teleporter → Backup # Secondary: Settings → Teleporter → Restore (upload backup file) ``` ### NPM DNS Sync Failing **Symptoms**: NPM proxy hosts missing from Pi-hole custom.list, new domains not resolving **Diagnosis**: ```bash # Check NPM sync script status ssh npm-pihole "cat /var/log/cron.log | grep npm-pihole-sync" # Run sync script manually to see errors ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh" # Check script can access both Pi-holes ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com" ssh npm-pihole "ssh ubuntu-manticore 'docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com'" # Verify SSH connectivity to ubuntu-manticore ssh npm-pihole "ssh ubuntu-manticore 'echo SSH OK'" ``` **Solutions**: ```bash # Fix SSH key authentication (if needed) ssh npm-pihole "ssh-copy-id ubuntu-manticore" # Test script with dry-run ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh --dry-run" # Run script manually to sync immediately ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh" # Verify cron job is configured ssh npm-pihole "crontab -l | grep npm-pihole-sync" # If cron job missing, add it ssh npm-pihole "crontab -e" # Add: 0 * * * * /home/cal/scripts/npm-pihole-sync.sh >> /var/log/npm-pihole-sync.log 2>&1 # Check script logs ssh npm-pihole "tail -50 /var/log/npm-pihole-sync.log" ``` ### Secondary Pi-hole Performance Issues **Symptoms**: ubuntu-manticore slow, high CPU/RAM usage, Pi-hole affecting Jellyfin/Tdarr **Diagnosis**: ```bash # Check resource usage ssh ubuntu-manticore "docker stats --no-stream" # Pi-hole should use <1% CPU and ~150MB RAM # If higher, investigate: ssh ubuntu-manticore "docker logs pihole --tail 100" # Check for excessive queries ssh ubuntu-manticore "docker exec pihole pihole -c -e" # Check for DNS loops or misconfiguration ssh ubuntu-manticore "docker exec pihole pihole -t" # Tail pihole.log ``` **Solutions**: ```bash # Restart Pi-hole if resource usage is high ssh ubuntu-manticore "docker restart pihole" # Check for DNS query loops # Look for same domain being queried repeatedly ssh ubuntu-manticore "docker exec pihole pihole -t | grep -A 5 'query\[A\]'" # Adjust Pi-hole cache settings if needed ssh ubuntu-manticore "docker exec pihole bash -c 'echo \"cache-size=10000\" >> /etc/dnsmasq.d/99-custom.conf'" ssh ubuntu-manticore "docker restart pihole" # If Jellyfin/Tdarr are affected, verify Pi-hole is using minimal resources # Resource limits can be added to docker-compose.yml: ssh ubuntu-manticore "nano ~/docker/pihole/docker-compose.yml" # Add under pihole service: # deploy: # resources: # limits: # cpus: '0.5' # memory: 256M ``` ### iOS Devices Still Getting 403 Errors (Post-HA Deployment) **Symptoms**: After deploying dual Pi-hole setup, iOS devices still bypass DNS and get 403 errors on internal services **Diagnosis**: ```bash # Verify UniFi DHCP has BOTH Pi-holes configured, NO public DNS # UniFi UI: Settings → Networks → LAN → DHCP → Name Server # DNS1: 10.10.0.16 # DNS2: 10.10.0.226 # Public DNS (1.1.1.1, 8.8.8.8): REMOVED # Check iOS DNS settings # iOS: Settings → WiFi → (i) → DNS # Should show: 10.10.0.16 # Force iOS DHCP renewal # iOS: Settings → WiFi → Forget Network → Reconnect # Check NPM logs for request source ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403" # Verify both Pi-holes have custom DNS entries ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com" ssh ubuntu-manticore "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com" ``` **Solutions**: ```bash # Solution 1: Verify public DNS is removed from UniFi DHCP # If public DNS (1.1.1.1) is still configured, iOS will prefer it # Remove ALL public DNS servers from UniFi DHCP configuration # Solution 2: Force iOS to renew DHCP lease # iOS: Settings → WiFi → Forget Network # Then reconnect to WiFi # This forces device to get new DNS servers from DHCP # Solution 3: Disable iOS encrypted DNS if still active # iOS: Settings → [Your Name] → iCloud → Private Relay → OFF # iOS: Check for DNS profiles: Settings → General → VPN & Device Management # Solution 4: If encrypted DNS persists, add public IP to NPM ACL (fallback) # See "iOS DNS Bypass Issues" section above for detailed steps # Solution 5: Test with different iOS device to isolate issue # If other iOS devices work, issue is device-specific configuration # Verification after fix ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log" # Access git.manticorum.com from iOS # Should see: [Client 10.0.0.x] - - 200 (local IP) ``` ### Both Pi-holes Failing Simultaneously **Symptoms**: Complete DNS failure across network, all devices cannot resolve domains **Diagnosis**: ```bash # Check both Pi-hole containers ssh npm-pihole "docker ps -a | grep pihole" ssh ubuntu-manticore "docker ps -a | grep pihole" # Check both hosts are reachable ping -c 4 10.10.0.16 ping -c 4 10.10.0.226 # Check Docker daemon on both hosts ssh npm-pihole "systemctl status docker" ssh ubuntu-manticore "systemctl status docker" # Test emergency DNS (bypassing Pi-hole) dig @8.8.8.8 google.com ``` **Solutions**: ```bash # Emergency: Temporarily use public DNS # UniFi UI: Settings → Networks → LAN → DHCP → Name Server # DNS1: 8.8.8.8 (Google DNS - temporary) # DNS2: 1.1.1.1 (Cloudflare - temporary) # Restart both Pi-holes ssh npm-pihole "docker restart pihole" ssh ubuntu-manticore "docker restart pihole" # If Docker daemon issues: ssh npm-pihole "sudo systemctl restart docker" ssh ubuntu-manticore "sudo systemctl restart docker" # Rebuild both Pi-holes if corruption suspected ssh npm-pihole "cd ~/pihole && docker compose down && docker compose up -d" ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d" # After Pi-holes are restored, revert UniFi DHCP to Pi-holes # UniFi UI: Settings → Networks → LAN → DHCP → Name Server # DNS1: 10.10.0.16 # DNS2: 10.10.0.226 ``` ### Query Load Not Balanced Between Pi-holes **Symptoms**: Primary Pi-hole getting most queries, secondary rarely used **Diagnosis**: ```bash # Check query counts on both Pi-holes # Primary: http://10.10.0.16/admin → Dashboard → Total Queries # Secondary: http://10.10.0.226:8053/admin → Dashboard → Total Queries # This is NORMAL behavior - clients prefer DNS1 by default # Secondary is for failover, not load balancing # To verify failover works: ssh npm-pihole "docker stop pihole" # Wait 30 seconds # Check secondary query count - should increase ssh npm-pihole "docker start pihole" ``` **Solutions**: ```bash # No action needed - this is expected behavior # DNS failover is for redundancy, not load distribution # If you want true load balancing (advanced): # Option 1: Configure some devices to prefer DNS2 # Manually set DNS on specific devices to 10.10.0.226, 10.10.0.16 # Option 2: Implement DNS round-robin (requires custom DHCP) # Not recommended for homelab - adds complexity # Option 3: Accept default behavior (recommended) # Primary handles most traffic, secondary provides failover # This is industry standard DNS HA behavior ``` ## Pi-hole Blocklist Blocking Legitimate Apps ### Facebook Blocklist Breaking Messenger Kids (2026-03-05) **Symptoms**: iPad could not connect to Facebook Messenger Kids. App would not load or send/receive messages. Disconnecting iPad from WiFi (using cellular) restored functionality. **Root Cause**: The `anudeepND/blacklist/master/facebook.txt` blocklist was subscribed in Pi-hole, which blocked all core Facebook domains needed by Messenger Kids. **Blocked Domains (from pihole.log)**: | Domain | Purpose | |--------|---------| | `edge-mqtt.facebook.com` | MQTT real-time message transport | | `graph.facebook.com` | Facebook Graph API (login, contacts, profiles) | | `graph-fallback.facebook.com` | Graph API fallback (blocked via CNAME chain) | | `www.facebook.com` | Core Facebook domain | **Allowed Domains** (not on the blocklist, resolved fine): - `dgw.c10r.facebook.com` - Data gateway - `mqtt.fallback.c10r.facebook.com` - MQTT fallback - `chat-e2ee.c10r.facebook.com` - E2E encrypted chat **Diagnosis**: ```bash # Find blocked domains for a specific client IP ssh pihole "docker exec pihole grep 'CLIENT_IP' /var/log/pihole/pihole.log | grep 'gravity blocked'" # Check which blocklist contains a domain ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com" # Output: https://raw.githubusercontent.com/anudeepND/blacklist/master/facebook.txt (block) ``` **Resolution**: Removed the Facebook blocklist from primary Pi-hole (secondary didn't have it). The blocklist contained ~3,997 Facebook domains. **Pi-hole v6 API - Deleting a Blocklist**: ```bash # Authenticate and get session ID SID=$(curl -s -X POST 'http://PIHOLE_IP:PORT/api/auth' \ -H 'Content-Type: application/json' \ -d '{"password":"APP_PASSWORD"}' \ | python3 -c 'import sys,json; print(json.load(sys.stdin)["session"]["sid"])') # DELETE uses the URL-encoded list ADDRESS as path parameter (NOT numeric ID) # The ?type=block parameter is REQUIRED curl -s -X DELETE \ "http://PIHOLE_IP:PORT/api/lists/URL_ENCODED_LIST_ADDRESS?type=block" \ -H "X-FTL-SID: $SID" # Success returns HTTP 204 No Content # Update gravity after removal ssh pihole "docker exec pihole pihole -g" # Verify domain is no longer blocked ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com" ``` **Important Pi-hole v6 API Notes**: - List endpoints use the URL-encoded blocklist address as path param, not numeric IDs - `?type=block` query parameter is mandatory for DELETE operations - Numeric ID DELETE returns 200 with `{"took": ...}` but DOES NOT actually delete (silent failure) - Successful address-based DELETE returns HTTP 204 (no body) - Must run `pihole -g` (gravity update) after deletion for changes to take effect **Future Improvement (TODO)**: Implement Pi-hole v6 group/client-based approach: - Create a group for the iPad that bypasses the Facebook blocklist - Re-add the Facebook blocklist assigned to the default group only - Assign the iPad's IP to a "Kids Devices" client group that excludes the Facebook list - This would maintain Facebook blocking for other devices while allowing Messenger Kids - See: Pi-hole v6 Admin -> Groups/Clients for per-device blocklist management ## Service Discovery and DNS Issues ### Local DNS Problems **Symptoms**: Services unreachable by hostname, DNS timeouts **Diagnosis**: ```bash # Test local DNS resolution nslookup service.homelab.local dig @10.10.0.16 service.homelab.local # Check DNS server status systemctl status bind9 # or named ``` **Solutions**: ```bash # Add to /etc/hosts as temporary fix echo "10.10.0.100 service.homelab.local" | sudo tee -a /etc/hosts # Restart DNS services sudo systemctl restart bind9 sudo systemctl restart systemd-resolved ``` ### Container Networking Issues **Symptoms**: Containers cannot communicate, service discovery fails **Diagnosis**: ```bash # Check Docker networks docker network ls docker network inspect bridge # Test container connectivity docker exec container1 ping container2 docker exec container1 nslookup container2 ``` **Solutions**: ```bash # Create custom network docker network create --driver bridge app-network docker run --network app-network container # Fix DNS in containers docker run --dns 8.8.8.8 container ``` ## Performance Issues ### Network Latency Problems **Symptoms**: Slow response times, timeouts, poor performance **Diagnosis**: ```bash # Measure network latency ping -c 100 host mtr --report host # Check network interface stats ip -s link show cat /proc/net/dev ``` **Solutions**: ```bash # Optimize network settings echo 'net.core.rmem_max = 134217728' | sudo tee -a /etc/sysctl.conf echo 'net.core.wmem_max = 134217728' | sudo tee -a /etc/sysctl.conf sudo sysctl -p # Check for network congestion iftop nethogs ``` ### Bandwidth Issues **Symptoms**: Slow transfers, network congestion, dropped packets **Diagnosis**: ```bash # Test bandwidth iperf3 -s # Server iperf3 -c server-ip # Client # Check interface utilization vnstat -i eth0 ``` **Solutions**: ```bash # Implement QoS if needed sudo tc qdisc add dev eth0 root fq_codel # Optimize buffer sizes sudo ethtool -G eth0 rx 4096 tx 4096 ``` ## Emergency Recovery Procedures ### Network Emergency Recovery **Complete network failure recovery**: ```bash # Reset all network configuration sudo systemctl stop networking sudo ip addr flush eth0 sudo ip route flush table main sudo systemctl start networking # Manual network configuration sudo ip addr add 10.10.0.100/24 dev eth0 sudo ip route add default via 10.10.0.1 echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf ``` ### SSH Emergency Access **When locked out of systems**: ```bash # Use emergency SSH key ssh -i ~/.ssh/emergency_homelab_rsa user@host # Via console access (if available) # Use hypervisor console or physical access # Reset SSH to allow password auth temporarily sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config sudo systemctl restart sshd ``` ### Service Recovery **Critical service restoration**: ```bash # Restart all network services sudo systemctl restart networking sudo systemctl restart nginx sudo systemctl restart sshd # Emergency firewall disable sudo ufw disable # CAUTION: Only for troubleshooting # Service-specific recovery sudo systemctl restart docker sudo systemctl restart systemd-resolved ``` ## Monitoring and Prevention ### Network Health Monitoring ```bash #!/bin/bash # network-monitor.sh CRITICAL_HOSTS="10.10.0.1 10.10.0.16 nas.homelab.local" CRITICAL_SERVICES="https://homelab.local http://proxmox.homelab.local:8006" for host in $CRITICAL_HOSTS; do if ! ping -c1 -W5 $host >/dev/null 2>&1; then echo "ALERT: $host unreachable" | logger -t network-monitor fi done for service in $CRITICAL_SERVICES; do if ! curl -sSf --max-time 10 "$service" >/dev/null 2>&1; then echo "ALERT: $service unavailable" | logger -t network-monitor fi done ``` ### Automated Recovery Scripts ```bash #!/bin/bash # network-recovery.sh if ! ping -c1 8.8.8.8 >/dev/null 2>&1; then echo "Network down, attempting recovery..." sudo systemctl restart networking sleep 10 if ping -c1 8.8.8.8 >/dev/null 2>&1; then echo "Network recovered" else echo "Manual intervention required" fi fi ``` ## Quick Reference Commands ### Network Diagnostics ```bash # Connectivity tests ping host traceroute host mtr host nc -zv host port # Service checks systemctl status networking systemctl status nginx systemctl status sshd # Network configuration ip addr show ip route show ss -tuln ``` ### Emergency Commands ```bash # Network restart sudo systemctl restart networking # SSH emergency access ssh -i ~/.ssh/emergency_homelab_rsa user@host # Firewall quick disable (emergency only) sudo ufw disable # DNS quick fix echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf ``` This troubleshooting guide provides comprehensive solutions for common networking issues in home lab environments.