Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
36 KiB
| title | description | type | domain | tags | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Networking Troubleshooting Guide | Comprehensive troubleshooting for SSH, DNS, reverse proxy, SSL, CIFS/NFS mounts, Pi-hole HA, iOS DNS bypass, UniFi firewall rules, and emergency recovery procedures. | troubleshooting | networking |
|
Networking Infrastructure Troubleshooting Guide
SSH Connection Issues
SSH Authentication Failures
Symptoms: Permission denied, connection refused, timeout Diagnosis:
# Verbose SSH debugging
ssh -vvv user@host
# Test different authentication methods
ssh -o PasswordAuthentication=no user@host
ssh -o PubkeyAuthentication=yes user@host
# Check local key files
ls -la ~/.ssh/
ssh-keygen -lf ~/.ssh/homelab_rsa.pub
Solutions:
# Re-deploy SSH keys
ssh-copy-id -i ~/.ssh/homelab_rsa.pub user@host
ssh-copy-id -i ~/.ssh/emergency_homelab_rsa.pub user@host
# Fix key permissions
chmod 600 ~/.ssh/homelab_rsa
chmod 644 ~/.ssh/homelab_rsa.pub
chmod 700 ~/.ssh
# Verify remote authorized_keys
ssh user@host 'chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys'
SSH Service Issues
Symptoms: Connection refused, service not running Diagnosis:
# Check SSH service status
systemctl status sshd
ss -tlnp | grep :22
# Test port connectivity
nc -zv host 22
nmap -p 22 host
Solutions:
# Restart SSH service
sudo systemctl restart sshd
sudo systemctl enable sshd
# Check firewall
sudo ufw status
sudo ufw allow ssh
# Verify SSH configuration
sudo sshd -T | grep -E "(passwordauth|pubkeyauth|permitroot)"
Network Connectivity Problems
Basic Network Troubleshooting
Symptoms: Cannot reach hosts, timeouts, routing issues Diagnosis:
# Basic connectivity tests
ping host
traceroute host
mtr host
# Check local network configuration
ip addr show
ip route show
cat /etc/resolv.conf
Solutions:
# Restart networking
sudo systemctl restart networking
sudo netplan apply # Ubuntu
# Reset network interface
sudo ip link set eth0 down
sudo ip link set eth0 up
# Check default gateway
sudo ip route add default via 10.10.0.1
DNS Resolution Issues
Symptoms: Cannot resolve hostnames, slow resolution Diagnosis:
# Test DNS resolution
nslookup google.com
dig google.com
host google.com
# Check DNS servers
systemd-resolve --status
cat /etc/resolv.conf
Solutions:
# Temporary DNS fix
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
# Restart DNS services
sudo systemctl restart systemd-resolved
# Flush DNS cache
sudo systemd-resolve --flush-caches
UniFi Firewall Blocking DNS to New Networks
Symptoms: New network/VLAN has "no internet access" - devices connect to WiFi but cannot browse or resolve domain names. Ping to IP addresses (8.8.8.8) works, but DNS resolution fails.
Root Cause: Firewall rules blocking traffic from DNS servers (Pi-holes in "Servers" network group) to new networks. Rules like "Servers to WiFi" or "Servers to Home" with DROP action block ALL traffic including DNS responses on port 53.
Diagnosis:
# From affected device on new network:
# Test if routing works (should succeed)
ping 8.8.8.8
traceroute 8.8.8.8
# Test if DNS resolution works (will fail)
nslookup google.com
# Test DNS servers directly (will timeout or fail)
nslookup google.com 10.10.0.16
nslookup google.com 10.10.0.226
# Test public DNS (should work)
nslookup google.com 8.8.8.8
# Check DHCP-assigned DNS servers
# Windows:
ipconfig /all | findstr DNS
# Linux/macOS:
cat /etc/resolv.conf
If routing works but DNS fails, the issue is firewall blocking DNS traffic, not network configuration.
Solutions:
Step 1: Identify Blocking Rules
- In UniFi: Settings → Firewall & Security → Traffic Rules → LAN In
- Look for DROP rules with:
- Source: Servers (or network group containing Pi-holes)
- Destination: Your new network (e.g., "Home WiFi", "Home Network")
- Examples: "Servers to WiFi", "Servers to Home"
Step 2: Create DNS Allow Rules (BEFORE Drop Rules)
Create new rules positioned ABOVE the drop rules:
Name: Allow DNS - Servers to [Network Name]
Action: Accept
Rule Applied: Before Predefined Rules
Type: LAN In
Protocol: TCP and UDP
Source:
- Network/Group: Servers (or specific Pi-hole IPs: 10.10.0.16, 10.10.0.226)
- Port: Any
Destination:
- Network: [Your new network - e.g., Home WiFi]
- Port: 53 (DNS)
Repeat for each network that needs DNS access from servers.
Step 3: Verify Rule Order
CRITICAL: Firewall rules process top-to-bottom, first match wins!
Correct order:
✅ Allow DNS - Servers to Home Network (Accept, Port 53)
✅ Allow DNS - Servers to Home WiFi (Accept, Port 53)
❌ Servers to Home (Drop, All ports)
❌ Servers to WiFi (Drop, All ports)
Step 4: Re-enable Drop Rules
Once DNS allow rules are in place and positioned correctly, re-enable the drop rules.
Verification:
# From device on new network:
# DNS should work
nslookup google.com
# Browsing should work
ping google.com
# Other server traffic should still be blocked (expected)
ping 10.10.0.16 # Should fail or timeout
ssh 10.10.0.16 # Should be blocked
Real-World Example: New "Home WiFi" network (10.1.0.0/24, VLAN 2)
- Problem: Devices connected but couldn't browse web
- Diagnosis:
traceroute 8.8.8.8worked (16ms), butnslookup google.comfailed - Cause: Firewall rule "Servers to WiFi" (rule 20004) blocked Pi-hole DNS responses
- Solution: Added "Allow DNS - Servers to Home WiFi" rule (Accept, port 53) above drop rule
- Result: DNS resolution works, other server traffic remains properly blocked
Reverse Proxy and Load Balancer Issues
Nginx Configuration Problems
Symptoms: 502 Bad Gateway, 503 Service Unavailable, SSL errors Diagnosis:
# Check Nginx status and logs
systemctl status nginx
sudo tail -f /var/log/nginx/error.log
sudo tail -f /var/log/nginx/access.log
# Test Nginx configuration
sudo nginx -t
sudo nginx -T # Show full configuration
Solutions:
# Reload Nginx configuration
sudo nginx -s reload
# Check upstream servers
curl -I http://backend-server:port
telnet backend-server port
# Fix common configuration issues
sudo nano /etc/nginx/sites-available/default
# Check proxy_pass URLs, upstream definitions
SSL/TLS Certificate Issues
Symptoms: Certificate warnings, expired certificates, connection errors Diagnosis:
# Check certificate validity
openssl s_client -connect host:443 -servername host
openssl x509 -in /etc/ssl/certs/cert.pem -text -noout
# Check certificate expiry
openssl x509 -in /etc/ssl/certs/cert.pem -noout -dates
Solutions:
# Renew Let's Encrypt certificates
sudo certbot renew --dry-run
sudo certbot renew --force-renewal
# Generate self-signed certificate
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/ssl/private/selfsigned.key \
-out /etc/ssl/certs/selfsigned.crt
Intermittent SSL Errors (ERR_SSL_UNRECOGNIZED_NAME_ALERT)
Symptoms: SSL errors that work sometimes but fail other times, ERR_SSL_UNRECOGNIZED_NAME_ALERT in browser, connection works from internal network intermittently
Root Cause: IPv6/IPv4 DNS conflicts where public DNS returns Cloudflare IPv6 addresses while local DNS (Pi-hole) only overrides IPv4. Modern systems prefer IPv6, causing intermittent failures when IPv6 connection attempts fail.
Diagnosis:
# Check for multiple DNS records (IPv4 + IPv6)
nslookup domain.example.com 10.10.0.16
dig domain.example.com @10.10.0.16
# Compare with public DNS
host domain.example.com 8.8.8.8
# Test IPv6 vs IPv4 connectivity
curl -6 -I https://domain.example.com # IPv6 (may fail)
curl -4 -I https://domain.example.com # IPv4 (should work)
# Check if system has IPv6 connectivity
ip -6 addr show | grep global
Example Problem:
# Local Pi-hole returns:
domain.example.com → 10.10.0.16 (IPv4 internal NPM)
# Public DNS also returns:
domain.example.com → 2606:4700:... (Cloudflare IPv6)
# System tries IPv6 first → fails
# Sometimes falls back to IPv4 → works
# Result: Intermittent SSL errors
Solutions:
Option 1: Add IPv6 Local DNS Override (Recommended)
# Add non-routable IPv6 address to Pi-hole custom.list
ssh pihole "docker exec pihole bash -c 'echo \"fe80::1 domain.example.com\" >> /etc/pihole/custom.list'"
# Restart Pi-hole DNS
ssh pihole "docker exec pihole pihole restartdns"
# Verify fix
nslookup domain.example.com 10.10.0.16
# Should show: 10.10.0.16 (IPv4) and fe80::1 (IPv6 link-local)
Option 2: Remove Cloudflare DNS Records (If public access not needed)
# In Cloudflare dashboard:
# - Turn off orange cloud (proxy) for the domain
# - Or delete A/AAAA records entirely
# This removes Cloudflare IPs from public DNS
Option 3: Disable IPv6 on Client (Temporary testing)
# Disable IPv6 temporarily to confirm diagnosis
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
# Test domain - should work consistently now
# Re-enable when done testing
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
Verification:
# After applying fix, verify consistent resolution
for i in {1..10}; do
echo "Test $i:"
curl -I https://domain.example.com 2>&1 | grep -E "(HTTP|SSL|certificate)"
sleep 1
done
# All attempts should succeed consistently
Real-World Example: git.manticorum.com
- Problem: Intermittent SSL errors from internal network (10.0.0.0/24)
- Diagnosis: Pi-hole had IPv4 override (10.10.0.16) but public DNS returned Cloudflare IPv6
- Solution: Added
fe80::1 git.manticorum.comto Pi-hole custom.list - Result: Consistent successful connections, always routes to internal NPM
iOS DNS Bypass Issues (Encrypted DNS)
Symptoms: iOS device gets 403 errors when accessing internal services, NPM logs show external public IP as source instead of local 10.x.x.x IP, even with correct Pi-hole DNS configuration
Root Cause: iOS devices can use encrypted DNS (DNS-over-HTTPS or DNS-over-TLS) that bypasses traditional DNS servers, even when correctly configured. This causes the device to resolve to public/Cloudflare IPs instead of local overrides, routing traffic through the public internet and triggering ACL denials.
Diagnosis:
# Check NPM access logs for the service
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403"
# Look for external IPs in logs instead of local 10.x.x.x:
# BAD: [Client 73.36.102.55] - - 403 (external IP, blocked by ACL)
# GOOD: [Client 10.0.0.207] - 200 200 (local IP, allowed)
# Verify iOS device is on local network
# On iOS: Settings → Wi-Fi → (i) → IP Address
# Should show 10.0.0.x or 10.10.0.x
# Verify Pi-hole DNS is configured
# On iOS: Settings → Wi-Fi → (i) → DNS
# Should show 10.10.0.16
# Test if DNS is actually being used
nslookup domain.example.com 10.10.0.16 # Shows what Pi-hole returns
# Then check what iOS actually resolves (if possible via network sniffer)
Example Problem:
# iOS device configuration:
IP Address: 10.0.0.207 (correct, on local network)
DNS: 10.10.0.16 (correct, Pi-hole configured)
Cellular Data: OFF
# But NPM logs show:
[Client 73.36.102.55] - - 403 # Coming from ISP public IP!
# Why: iOS is using encrypted DNS, bypassing Pi-hole
# Result: Resolves to Cloudflare IP, routes through public internet,
# NPM sees external IP, ACL blocks with 403
Solutions:
Option 1: Add Public IP to NPM Access Rules (Quickest, recommended for mobile devices)
# Find which config file contains your domain
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sh -c 'grep -l domain.example.com /data/nginx/proxy_host/*.conf'"
# Example output: /data/nginx/proxy_host/19.conf
# Add public IP to access rules (replace YOUR_PUBLIC_IP and config number)
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sed -i '/allow 10.10.0.0\/24;/a \ \n allow YOUR_PUBLIC_IP;' /data/nginx/proxy_host/19.conf"
# Verify the change
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 cat /data/nginx/proxy_host/19.conf" | grep -A 8 "Access Rules"
# Test and reload nginx
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -t"
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -s reload"
Option 2: Reset iOS Network Settings (Nuclear option, clears DNS cache/profiles)
iOS: Settings → General → Transfer or Reset iPhone → Reset → Reset Network Settings
WARNING: This removes all saved WiFi passwords and network configurations
Option 3: Check for DNS Configuration Profiles
iOS: Settings → General → VPN & Device Management
- Look for any DNS or Configuration Profiles
- Remove any third-party DNS profiles (AdGuard, NextDNS, etc.)
Option 4: Disable Private Relay and IP Tracking (Usually already tried)
iOS: Settings → [Your Name] → iCloud → Private Relay → OFF
iOS: Settings → Wi-Fi → (i) → Limit IP Address Tracking → OFF
Option 5: Check Browser DNS Settings (If using Brave or Firefox)
Brave: Settings → Brave Shields & Privacy → Use secure DNS → OFF
Firefox: Settings → DNS over HTTPS → OFF
Verification:
# After applying fix, check NPM logs while accessing from iOS
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log"
# With Option 1 (added public IP): Should see 200 status with external IP
# With Option 2-5 (fixed DNS): Should see 200 status with local 10.x.x.x IP
Important Notes:
- Option 1 is recommended for mobile devices as iOS encrypted DNS behavior is inconsistent
- Public IP workaround requires updating if ISP changes your IP (rare for residential)
- Manual nginx config changes (Option 1) will be overwritten if you edit the proxy host in NPM UI
- To make permanent, either use NPM UI to add the IP, or re-apply after UI changes
- This issue can affect any iOS device (iPhone, iPad) and some Android devices with encrypted DNS
Real-World Example: git.manticorum.com iOS Access
- Problem: iPhone showing 403 errors, desktop working fine on same network
- iOS Config: IP 10.0.0.207, DNS 10.10.0.16, Cellular OFF (all correct)
- NPM Logs: iPhone requests showing as [Client 73.36.102.55] (ISP public IP)
- Diagnosis: iOS using encrypted DNS, bypassing Pi-hole, routing through Cloudflare
- Solution: Added
allow 73.36.102.55;to NPM proxy_host/19.conf ACL rules - Result: Immediate access, user able to log in to Gitea successfully
Network Storage Issues
CIFS/SMB Mount Problems
Symptoms: Mount failures, connection timeouts, permission errors Diagnosis:
# Test SMB connectivity
smbclient -L //nas-server -U username
testparm # Test Samba configuration
# Check mount status
mount | grep cifs
df -h | grep cifs
Solutions:
# Remount with verbose logging
sudo mount -t cifs //server/share /mnt/point -o username=user,password=pass,vers=3.0
# Fix mount options in /etc/fstab
//server/share /mnt/point cifs credentials=/etc/cifs/credentials,uid=1000,gid=1000,iocharset=utf8,file_mode=0644,dir_mode=0755,cache=strict,_netdev 0 0
# Test credentials
sudo cat /etc/cifs/credentials
# Should contain: username=, password=, domain=
NFS Mount Issues
Symptoms: Stale file handles, mount hangs, permission denied Diagnosis:
# Check NFS services
systemctl status nfs-client.target
showmount -e nfs-server
# Test NFS connectivity
rpcinfo -p nfs-server
Solutions:
# Restart NFS services
sudo systemctl restart nfs-client.target
# Remount NFS shares
sudo umount /mnt/nfs-share
sudo mount -t nfs server:/path /mnt/nfs-share
# Fix stale file handles
sudo umount -f /mnt/nfs-share
sudo mount /mnt/nfs-share
Firewall and Security Issues
Port Access Problems
Symptoms: Connection refused, filtered ports, blocked services Diagnosis:
# Check firewall status
sudo ufw status verbose
sudo iptables -L -n -v
# Test port accessibility
nc -zv host port
nmap -p port host
Solutions:
# Open required ports
sudo ufw allow ssh
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw allow from 10.10.0.0/24
# Reset firewall if needed
sudo ufw --force reset
sudo ufw enable
Network Security Issues
Symptoms: Unauthorized access, suspicious traffic, security alerts Diagnosis:
# Check active connections
ss -tuln
netstat -tuln
# Review logs for security events
sudo tail -f /var/log/auth.log
sudo tail -f /var/log/syslog | grep -i security
Solutions:
# Block suspicious IPs
sudo ufw deny from suspicious-ip
# Update SSH security
sudo nano /etc/ssh/sshd_config
# Set: PasswordAuthentication no, PermitRootLogin no
sudo systemctl restart sshd
Pi-hole High Availability Troubleshooting
Pi-hole Not Responding to DNS Queries
Symptoms: DNS resolution failures, clients cannot resolve domains, Pi-hole web UI inaccessible Diagnosis:
# Test DNS response from both Pi-holes
dig @10.10.0.16 google.com
dig @10.10.0.226 google.com
# Check Pi-hole container status
ssh npm-pihole "docker ps | grep pihole"
ssh ubuntu-manticore "docker ps | grep pihole"
# Check Pi-hole logs
ssh npm-pihole "docker logs pihole --tail 50"
ssh ubuntu-manticore "docker logs pihole --tail 50"
# Test port 53 is listening
ssh ubuntu-manticore "netstat -tulpn | grep :53"
ssh ubuntu-manticore "ss -tulpn | grep :53"
Solutions:
# Restart Pi-hole containers
ssh npm-pihole "docker restart pihole"
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose restart"
# Check for port conflicts
ssh ubuntu-manticore "lsof -i :53"
# If systemd-resolved is conflicting, disable it
ssh ubuntu-manticore "sudo systemctl stop systemd-resolved"
ssh ubuntu-manticore "sudo systemctl disable systemd-resolved"
# Rebuild Pi-hole container
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d"
DNS Failover Not Working
Symptoms: DNS stops working when primary Pi-hole fails, clients not using secondary DNS Diagnosis:
# Check UniFi DHCP DNS configuration
# Via UniFi UI: Settings → Networks → LAN → DHCP
# DNS Server 1: 10.10.0.16
# DNS Server 2: 10.10.0.226
# Check client DNS configuration
# Windows:
ipconfig /all | findstr /i "DNS"
# Linux/macOS:
cat /etc/resolv.conf
# Check if secondary Pi-hole is reachable
ping -c 4 10.10.0.226
dig @10.10.0.226 google.com
# Test failover manually
ssh npm-pihole "docker stop pihole"
dig google.com # Should still work via secondary
ssh npm-pihole "docker start pihole"
Solutions:
# Force DHCP lease renewal to get updated DNS servers
# Windows:
ipconfig /release && ipconfig /renew
# Linux:
sudo dhclient -r && sudo dhclient
# macOS/iOS:
# Disconnect and reconnect to WiFi
# Verify UniFi DHCP settings are correct
# Both DNS servers must be configured in UniFi controller
# Check client respects both DNS servers
# Some clients may cache failed DNS responses
# Flush DNS cache:
# Windows: ipconfig /flushdns
# macOS: sudo dscacheutil -flushcache
# Linux: sudo systemd-resolve --flush-caches
Orbital Sync Not Syncing
Symptoms: Blocklists/whitelists differ between Pi-holes, custom DNS entries missing on secondary Diagnosis:
# Check Orbital Sync container status
ssh ubuntu-manticore "docker ps | grep orbital-sync"
# Check Orbital Sync logs
ssh ubuntu-manticore "docker logs orbital-sync --tail 100"
# Look for sync errors in logs
ssh ubuntu-manticore "docker logs orbital-sync 2>&1 | grep -i error"
# Verify API tokens are correct
ssh ubuntu-manticore "cat ~/docker/orbital-sync/.env"
# Test API access manually
ssh npm-pihole "docker exec pihole pihole -a -p" # Get API token
curl -H "Authorization: Token YOUR_TOKEN" http://10.10.0.16/admin/api.php?status
# Compare blocklist counts between Pi-holes
ssh npm-pihole "docker exec pihole pihole -g -l"
ssh ubuntu-manticore "docker exec pihole pihole -g -l"
Solutions:
# Regenerate API tokens
# Primary Pi-hole: http://10.10.0.16/admin → Settings → API → Generate New Token
# Secondary Pi-hole: http://10.10.0.226:8053/admin → Settings → API → Generate New Token
# Update Orbital Sync .env file
ssh ubuntu-manticore "nano ~/docker/orbital-sync/.env"
# Update PRIMARY_HOST_PASSWORD and SECONDARY_HOST_PASSWORD
# Restart Orbital Sync
ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose restart"
# Force immediate sync by restarting
ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose down && docker compose up -d"
# Monitor sync in real-time
ssh ubuntu-manticore "docker logs orbital-sync -f"
# If all else fails, manually sync via Teleporter
# Primary: Settings → Teleporter → Backup
# Secondary: Settings → Teleporter → Restore (upload backup file)
NPM DNS Sync Failing
Symptoms: NPM proxy hosts missing from Pi-hole custom.list, new domains not resolving Diagnosis:
# Check NPM sync script status
ssh npm-pihole "cat /var/log/cron.log | grep npm-pihole-sync"
# Run sync script manually to see errors
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh"
# Check script can access both Pi-holes
ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
ssh npm-pihole "ssh ubuntu-manticore 'docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com'"
# Verify SSH connectivity to ubuntu-manticore
ssh npm-pihole "ssh ubuntu-manticore 'echo SSH OK'"
Solutions:
# Fix SSH key authentication (if needed)
ssh npm-pihole "ssh-copy-id ubuntu-manticore"
# Test script with dry-run
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh --dry-run"
# Run script manually to sync immediately
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh"
# Verify cron job is configured
ssh npm-pihole "crontab -l | grep npm-pihole-sync"
# If cron job missing, add it
ssh npm-pihole "crontab -e"
# Add: 0 * * * * /home/cal/scripts/npm-pihole-sync.sh >> /var/log/npm-pihole-sync.log 2>&1
# Check script logs
ssh npm-pihole "tail -50 /var/log/npm-pihole-sync.log"
Secondary Pi-hole Performance Issues
Symptoms: ubuntu-manticore slow, high CPU/RAM usage, Pi-hole affecting Jellyfin/Tdarr Diagnosis:
# Check resource usage
ssh ubuntu-manticore "docker stats --no-stream"
# Pi-hole should use <1% CPU and ~150MB RAM
# If higher, investigate:
ssh ubuntu-manticore "docker logs pihole --tail 100"
# Check for excessive queries
ssh ubuntu-manticore "docker exec pihole pihole -c -e"
# Check for DNS loops or misconfiguration
ssh ubuntu-manticore "docker exec pihole pihole -t" # Tail pihole.log
Solutions:
# Restart Pi-hole if resource usage is high
ssh ubuntu-manticore "docker restart pihole"
# Check for DNS query loops
# Look for same domain being queried repeatedly
ssh ubuntu-manticore "docker exec pihole pihole -t | grep -A 5 'query\[A\]'"
# Adjust Pi-hole cache settings if needed
ssh ubuntu-manticore "docker exec pihole bash -c 'echo \"cache-size=10000\" >> /etc/dnsmasq.d/99-custom.conf'"
ssh ubuntu-manticore "docker restart pihole"
# If Jellyfin/Tdarr are affected, verify Pi-hole is using minimal resources
# Resource limits can be added to docker-compose.yml:
ssh ubuntu-manticore "nano ~/docker/pihole/docker-compose.yml"
# Add under pihole service:
# deploy:
# resources:
# limits:
# cpus: '0.5'
# memory: 256M
iOS Devices Still Getting 403 Errors (Post-HA Deployment)
Symptoms: After deploying dual Pi-hole setup, iOS devices still bypass DNS and get 403 errors on internal services Diagnosis:
# Verify UniFi DHCP has BOTH Pi-holes configured, NO public DNS
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
# DNS1: 10.10.0.16
# DNS2: 10.10.0.226
# Public DNS (1.1.1.1, 8.8.8.8): REMOVED
# Check iOS DNS settings
# iOS: Settings → WiFi → (i) → DNS
# Should show: 10.10.0.16
# Force iOS DHCP renewal
# iOS: Settings → WiFi → Forget Network → Reconnect
# Check NPM logs for request source
ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403"
# Verify both Pi-holes have custom DNS entries
ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
ssh ubuntu-manticore "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
Solutions:
# Solution 1: Verify public DNS is removed from UniFi DHCP
# If public DNS (1.1.1.1) is still configured, iOS will prefer it
# Remove ALL public DNS servers from UniFi DHCP configuration
# Solution 2: Force iOS to renew DHCP lease
# iOS: Settings → WiFi → Forget Network
# Then reconnect to WiFi
# This forces device to get new DNS servers from DHCP
# Solution 3: Disable iOS encrypted DNS if still active
# iOS: Settings → [Your Name] → iCloud → Private Relay → OFF
# iOS: Check for DNS profiles: Settings → General → VPN & Device Management
# Solution 4: If encrypted DNS persists, add public IP to NPM ACL (fallback)
# See "iOS DNS Bypass Issues" section above for detailed steps
# Solution 5: Test with different iOS device to isolate issue
# If other iOS devices work, issue is device-specific configuration
# Verification after fix
ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log"
# Access git.manticorum.com from iOS
# Should see: [Client 10.0.0.x] - - 200 (local IP)
Both Pi-holes Failing Simultaneously
Symptoms: Complete DNS failure across network, all devices cannot resolve domains Diagnosis:
# Check both Pi-hole containers
ssh npm-pihole "docker ps -a | grep pihole"
ssh ubuntu-manticore "docker ps -a | grep pihole"
# Check both hosts are reachable
ping -c 4 10.10.0.16
ping -c 4 10.10.0.226
# Check Docker daemon on both hosts
ssh npm-pihole "systemctl status docker"
ssh ubuntu-manticore "systemctl status docker"
# Test emergency DNS (bypassing Pi-hole)
dig @8.8.8.8 google.com
Solutions:
# Emergency: Temporarily use public DNS
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
# DNS1: 8.8.8.8 (Google DNS - temporary)
# DNS2: 1.1.1.1 (Cloudflare - temporary)
# Restart both Pi-holes
ssh npm-pihole "docker restart pihole"
ssh ubuntu-manticore "docker restart pihole"
# If Docker daemon issues:
ssh npm-pihole "sudo systemctl restart docker"
ssh ubuntu-manticore "sudo systemctl restart docker"
# Rebuild both Pi-holes if corruption suspected
ssh npm-pihole "cd ~/pihole && docker compose down && docker compose up -d"
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d"
# After Pi-holes are restored, revert UniFi DHCP to Pi-holes
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
# DNS1: 10.10.0.16
# DNS2: 10.10.0.226
Query Load Not Balanced Between Pi-holes
Symptoms: Primary Pi-hole getting most queries, secondary rarely used Diagnosis:
# Check query counts on both Pi-holes
# Primary: http://10.10.0.16/admin → Dashboard → Total Queries
# Secondary: http://10.10.0.226:8053/admin → Dashboard → Total Queries
# This is NORMAL behavior - clients prefer DNS1 by default
# Secondary is for failover, not load balancing
# To verify failover works:
ssh npm-pihole "docker stop pihole"
# Wait 30 seconds
# Check secondary query count - should increase
ssh npm-pihole "docker start pihole"
Solutions:
# No action needed - this is expected behavior
# DNS failover is for redundancy, not load distribution
# If you want true load balancing (advanced):
# Option 1: Configure some devices to prefer DNS2
# Manually set DNS on specific devices to 10.10.0.226, 10.10.0.16
# Option 2: Implement DNS round-robin (requires custom DHCP)
# Not recommended for homelab - adds complexity
# Option 3: Accept default behavior (recommended)
# Primary handles most traffic, secondary provides failover
# This is industry standard DNS HA behavior
Pi-hole Blocklist Blocking Legitimate Apps
Facebook Blocklist Breaking Messenger Kids (2026-03-05)
Symptoms: iPad could not connect to Facebook Messenger Kids. App would not load or send/receive messages. Disconnecting iPad from WiFi (using cellular) restored functionality.
Root Cause: The anudeepND/blacklist/master/facebook.txt blocklist was subscribed in Pi-hole, which blocked all core Facebook domains needed by Messenger Kids.
Blocked Domains (from pihole.log):
| Domain | Purpose |
|---|---|
edge-mqtt.facebook.com |
MQTT real-time message transport |
graph.facebook.com |
Facebook Graph API (login, contacts, profiles) |
graph-fallback.facebook.com |
Graph API fallback (blocked via CNAME chain) |
www.facebook.com |
Core Facebook domain |
Allowed Domains (not on the blocklist, resolved fine):
dgw.c10r.facebook.com- Data gatewaymqtt.fallback.c10r.facebook.com- MQTT fallbackchat-e2ee.c10r.facebook.com- E2E encrypted chat
Diagnosis:
# Find blocked domains for a specific client IP
ssh pihole "docker exec pihole grep 'CLIENT_IP' /var/log/pihole/pihole.log | grep 'gravity blocked'"
# Check which blocklist contains a domain
ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com"
# Output: https://raw.githubusercontent.com/anudeepND/blacklist/master/facebook.txt (block)
Resolution: Removed the Facebook blocklist from primary Pi-hole (secondary didn't have it). The blocklist contained ~3,997 Facebook domains.
Pi-hole v6 API - Deleting a Blocklist:
# Authenticate and get session ID
SID=$(curl -s -X POST 'http://PIHOLE_IP:PORT/api/auth' \
-H 'Content-Type: application/json' \
-d '{"password":"APP_PASSWORD"}' \
| python3 -c 'import sys,json; print(json.load(sys.stdin)["session"]["sid"])')
# DELETE uses the URL-encoded list ADDRESS as path parameter (NOT numeric ID)
# The ?type=block parameter is REQUIRED
curl -s -X DELETE \
"http://PIHOLE_IP:PORT/api/lists/URL_ENCODED_LIST_ADDRESS?type=block" \
-H "X-FTL-SID: $SID"
# Success returns HTTP 204 No Content
# Update gravity after removal
ssh pihole "docker exec pihole pihole -g"
# Verify domain is no longer blocked
ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com"
Important Pi-hole v6 API Notes:
- List endpoints use the URL-encoded blocklist address as path param, not numeric IDs
?type=blockquery parameter is mandatory for DELETE operations- Numeric ID DELETE returns 200 with
{"took": ...}but DOES NOT actually delete (silent failure) - Successful address-based DELETE returns HTTP 204 (no body)
- Must run
pihole -g(gravity update) after deletion for changes to take effect
Future Improvement (TODO): Implement Pi-hole v6 group/client-based approach:
- Create a group for the iPad that bypasses the Facebook blocklist
- Re-add the Facebook blocklist assigned to the default group only
- Assign the iPad's IP to a "Kids Devices" client group that excludes the Facebook list
- This would maintain Facebook blocking for other devices while allowing Messenger Kids
- See: Pi-hole v6 Admin -> Groups/Clients for per-device blocklist management
Service Discovery and DNS Issues
Local DNS Problems
Symptoms: Services unreachable by hostname, DNS timeouts Diagnosis:
# Test local DNS resolution
nslookup service.homelab.local
dig @10.10.0.16 service.homelab.local
# Check DNS server status
systemctl status bind9 # or named
Solutions:
# Add to /etc/hosts as temporary fix
echo "10.10.0.100 service.homelab.local" | sudo tee -a /etc/hosts
# Restart DNS services
sudo systemctl restart bind9
sudo systemctl restart systemd-resolved
Container Networking Issues
Symptoms: Containers cannot communicate, service discovery fails Diagnosis:
# Check Docker networks
docker network ls
docker network inspect bridge
# Test container connectivity
docker exec container1 ping container2
docker exec container1 nslookup container2
Solutions:
# Create custom network
docker network create --driver bridge app-network
docker run --network app-network container
# Fix DNS in containers
docker run --dns 8.8.8.8 container
Performance Issues
Network Latency Problems
Symptoms: Slow response times, timeouts, poor performance Diagnosis:
# Measure network latency
ping -c 100 host
mtr --report host
# Check network interface stats
ip -s link show
cat /proc/net/dev
Solutions:
# Optimize network settings
echo 'net.core.rmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
echo 'net.core.wmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
# Check for network congestion
iftop
nethogs
Bandwidth Issues
Symptoms: Slow transfers, network congestion, dropped packets Diagnosis:
# Test bandwidth
iperf3 -s # Server
iperf3 -c server-ip # Client
# Check interface utilization
vnstat -i eth0
Solutions:
# Implement QoS if needed
sudo tc qdisc add dev eth0 root fq_codel
# Optimize buffer sizes
sudo ethtool -G eth0 rx 4096 tx 4096
Emergency Recovery Procedures
Network Emergency Recovery
Complete network failure recovery:
# Reset all network configuration
sudo systemctl stop networking
sudo ip addr flush eth0
sudo ip route flush table main
sudo systemctl start networking
# Manual network configuration
sudo ip addr add 10.10.0.100/24 dev eth0
sudo ip route add default via 10.10.0.1
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
SSH Emergency Access
When locked out of systems:
# Use emergency SSH key
ssh -i ~/.ssh/emergency_homelab_rsa user@host
# Via console access (if available)
# Use hypervisor console or physical access
# Reset SSH to allow password auth temporarily
sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
Service Recovery
Critical service restoration:
# Restart all network services
sudo systemctl restart networking
sudo systemctl restart nginx
sudo systemctl restart sshd
# Emergency firewall disable
sudo ufw disable # CAUTION: Only for troubleshooting
# Service-specific recovery
sudo systemctl restart docker
sudo systemctl restart systemd-resolved
Monitoring and Prevention
Network Health Monitoring
#!/bin/bash
# network-monitor.sh
CRITICAL_HOSTS="10.10.0.1 10.10.0.16 nas.homelab.local"
CRITICAL_SERVICES="https://homelab.local http://proxmox.homelab.local:8006"
for host in $CRITICAL_HOSTS; do
if ! ping -c1 -W5 $host >/dev/null 2>&1; then
echo "ALERT: $host unreachable" | logger -t network-monitor
fi
done
for service in $CRITICAL_SERVICES; do
if ! curl -sSf --max-time 10 "$service" >/dev/null 2>&1; then
echo "ALERT: $service unavailable" | logger -t network-monitor
fi
done
Automated Recovery Scripts
#!/bin/bash
# network-recovery.sh
if ! ping -c1 8.8.8.8 >/dev/null 2>&1; then
echo "Network down, attempting recovery..."
sudo systemctl restart networking
sleep 10
if ping -c1 8.8.8.8 >/dev/null 2>&1; then
echo "Network recovered"
else
echo "Manual intervention required"
fi
fi
Quick Reference Commands
Network Diagnostics
# Connectivity tests
ping host
traceroute host
mtr host
nc -zv host port
# Service checks
systemctl status networking
systemctl status nginx
systemctl status sshd
# Network configuration
ip addr show
ip route show
ss -tuln
Emergency Commands
# Network restart
sudo systemctl restart networking
# SSH emergency access
ssh -i ~/.ssh/emergency_homelab_rsa user@host
# Firewall quick disable (emergency only)
sudo ufw disable
# DNS quick fix
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
This troubleshooting guide provides comprehensive solutions for common networking issues in home lab environments.