Document Messenger Kids connectivity issue caused by anudeepND Facebook blocklist blocking edge-mqtt/graph.facebook.com. Includes Pi-hole v6 API gotcha where numeric ID deletes silently fail (must use URL-encoded address). TODO added for future per-device group-based blocklist management. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1214 lines
35 KiB
Markdown
1214 lines
35 KiB
Markdown
# Networking Infrastructure Troubleshooting Guide
|
|
|
|
## SSH Connection Issues
|
|
|
|
### SSH Authentication Failures
|
|
**Symptoms**: Permission denied, connection refused, timeout
|
|
**Diagnosis**:
|
|
```bash
|
|
# Verbose SSH debugging
|
|
ssh -vvv user@host
|
|
|
|
# Test different authentication methods
|
|
ssh -o PasswordAuthentication=no user@host
|
|
ssh -o PubkeyAuthentication=yes user@host
|
|
|
|
# Check local key files
|
|
ls -la ~/.ssh/
|
|
ssh-keygen -lf ~/.ssh/homelab_rsa.pub
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Re-deploy SSH keys
|
|
ssh-copy-id -i ~/.ssh/homelab_rsa.pub user@host
|
|
ssh-copy-id -i ~/.ssh/emergency_homelab_rsa.pub user@host
|
|
|
|
# Fix key permissions
|
|
chmod 600 ~/.ssh/homelab_rsa
|
|
chmod 644 ~/.ssh/homelab_rsa.pub
|
|
chmod 700 ~/.ssh
|
|
|
|
# Verify remote authorized_keys
|
|
ssh user@host 'chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys'
|
|
```
|
|
|
|
### SSH Service Issues
|
|
**Symptoms**: Connection refused, service not running
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check SSH service status
|
|
systemctl status sshd
|
|
ss -tlnp | grep :22
|
|
|
|
# Test port connectivity
|
|
nc -zv host 22
|
|
nmap -p 22 host
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart SSH service
|
|
sudo systemctl restart sshd
|
|
sudo systemctl enable sshd
|
|
|
|
# Check firewall
|
|
sudo ufw status
|
|
sudo ufw allow ssh
|
|
|
|
# Verify SSH configuration
|
|
sudo sshd -T | grep -E "(passwordauth|pubkeyauth|permitroot)"
|
|
```
|
|
|
|
## Network Connectivity Problems
|
|
|
|
### Basic Network Troubleshooting
|
|
**Symptoms**: Cannot reach hosts, timeouts, routing issues
|
|
**Diagnosis**:
|
|
```bash
|
|
# Basic connectivity tests
|
|
ping host
|
|
traceroute host
|
|
mtr host
|
|
|
|
# Check local network configuration
|
|
ip addr show
|
|
ip route show
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart networking
|
|
sudo systemctl restart networking
|
|
sudo netplan apply # Ubuntu
|
|
|
|
# Reset network interface
|
|
sudo ip link set eth0 down
|
|
sudo ip link set eth0 up
|
|
|
|
# Check default gateway
|
|
sudo ip route add default via 10.10.0.1
|
|
```
|
|
|
|
### DNS Resolution Issues
|
|
**Symptoms**: Cannot resolve hostnames, slow resolution
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test DNS resolution
|
|
nslookup google.com
|
|
dig google.com
|
|
host google.com
|
|
|
|
# Check DNS servers
|
|
systemd-resolve --status
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Temporary DNS fix
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
|
|
# Restart DNS services
|
|
sudo systemctl restart systemd-resolved
|
|
|
|
# Flush DNS cache
|
|
sudo systemd-resolve --flush-caches
|
|
```
|
|
|
|
### UniFi Firewall Blocking DNS to New Networks
|
|
**Symptoms**: New network/VLAN has "no internet access" - devices connect to WiFi but cannot browse or resolve domain names. Ping to IP addresses (8.8.8.8) works, but DNS resolution fails.
|
|
|
|
**Root Cause**: Firewall rules blocking traffic from DNS servers (Pi-holes in "Servers" network group) to new networks. Rules like "Servers to WiFi" or "Servers to Home" with DROP action block ALL traffic including DNS responses on port 53.
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# From affected device on new network:
|
|
|
|
# Test if routing works (should succeed)
|
|
ping 8.8.8.8
|
|
traceroute 8.8.8.8
|
|
|
|
# Test if DNS resolution works (will fail)
|
|
nslookup google.com
|
|
|
|
# Test DNS servers directly (will timeout or fail)
|
|
nslookup google.com 10.10.0.16
|
|
nslookup google.com 10.10.0.226
|
|
|
|
# Test public DNS (should work)
|
|
nslookup google.com 8.8.8.8
|
|
|
|
# Check DHCP-assigned DNS servers
|
|
# Windows:
|
|
ipconfig /all | findstr DNS
|
|
|
|
# Linux/macOS:
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**If routing works but DNS fails**, the issue is firewall blocking DNS traffic, not network configuration.
|
|
|
|
**Solutions**:
|
|
|
|
**Step 1: Identify Blocking Rules**
|
|
- In UniFi: Settings → Firewall & Security → Traffic Rules → LAN In
|
|
- Look for DROP rules with:
|
|
- Source: Servers (or network group containing Pi-holes)
|
|
- Destination: Your new network (e.g., "Home WiFi", "Home Network")
|
|
- Examples: "Servers to WiFi", "Servers to Home"
|
|
|
|
**Step 2: Create DNS Allow Rules (BEFORE Drop Rules)**
|
|
|
|
Create new rules positioned ABOVE the drop rules:
|
|
|
|
```
|
|
Name: Allow DNS - Servers to [Network Name]
|
|
Action: Accept
|
|
Rule Applied: Before Predefined Rules
|
|
Type: LAN In
|
|
Protocol: TCP and UDP
|
|
Source:
|
|
- Network/Group: Servers (or specific Pi-hole IPs: 10.10.0.16, 10.10.0.226)
|
|
- Port: Any
|
|
Destination:
|
|
- Network: [Your new network - e.g., Home WiFi]
|
|
- Port: 53 (DNS)
|
|
```
|
|
|
|
Repeat for each network that needs DNS access from servers.
|
|
|
|
**Step 3: Verify Rule Order**
|
|
|
|
**CRITICAL**: Firewall rules process top-to-bottom, first match wins!
|
|
|
|
Correct order:
|
|
```
|
|
✅ Allow DNS - Servers to Home Network (Accept, Port 53)
|
|
✅ Allow DNS - Servers to Home WiFi (Accept, Port 53)
|
|
❌ Servers to Home (Drop, All ports)
|
|
❌ Servers to WiFi (Drop, All ports)
|
|
```
|
|
|
|
**Step 4: Re-enable Drop Rules**
|
|
|
|
Once DNS allow rules are in place and positioned correctly, re-enable the drop rules.
|
|
|
|
**Verification**:
|
|
```bash
|
|
# From device on new network:
|
|
|
|
# DNS should work
|
|
nslookup google.com
|
|
|
|
# Browsing should work
|
|
ping google.com
|
|
|
|
# Other server traffic should still be blocked (expected)
|
|
ping 10.10.0.16 # Should fail or timeout
|
|
ssh 10.10.0.16 # Should be blocked
|
|
```
|
|
|
|
**Real-World Example**: New "Home WiFi" network (10.1.0.0/24, VLAN 2)
|
|
- **Problem**: Devices connected but couldn't browse web
|
|
- **Diagnosis**: `traceroute 8.8.8.8` worked (16ms), but `nslookup google.com` failed
|
|
- **Cause**: Firewall rule "Servers to WiFi" (rule 20004) blocked Pi-hole DNS responses
|
|
- **Solution**: Added "Allow DNS - Servers to Home WiFi" rule (Accept, port 53) above drop rule
|
|
- **Result**: DNS resolution works, other server traffic remains properly blocked
|
|
|
|
## Reverse Proxy and Load Balancer Issues
|
|
|
|
### Nginx Configuration Problems
|
|
**Symptoms**: 502 Bad Gateway, 503 Service Unavailable, SSL errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Nginx status and logs
|
|
systemctl status nginx
|
|
sudo tail -f /var/log/nginx/error.log
|
|
sudo tail -f /var/log/nginx/access.log
|
|
|
|
# Test Nginx configuration
|
|
sudo nginx -t
|
|
sudo nginx -T # Show full configuration
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Reload Nginx configuration
|
|
sudo nginx -s reload
|
|
|
|
# Check upstream servers
|
|
curl -I http://backend-server:port
|
|
telnet backend-server port
|
|
|
|
# Fix common configuration issues
|
|
sudo nano /etc/nginx/sites-available/default
|
|
# Check proxy_pass URLs, upstream definitions
|
|
```
|
|
|
|
### SSL/TLS Certificate Issues
|
|
**Symptoms**: Certificate warnings, expired certificates, connection errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check certificate validity
|
|
openssl s_client -connect host:443 -servername host
|
|
openssl x509 -in /etc/ssl/certs/cert.pem -text -noout
|
|
|
|
# Check certificate expiry
|
|
openssl x509 -in /etc/ssl/certs/cert.pem -noout -dates
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Renew Let's Encrypt certificates
|
|
sudo certbot renew --dry-run
|
|
sudo certbot renew --force-renewal
|
|
|
|
# Generate self-signed certificate
|
|
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
|
|
-keyout /etc/ssl/private/selfsigned.key \
|
|
-out /etc/ssl/certs/selfsigned.crt
|
|
```
|
|
|
|
### Intermittent SSL Errors (ERR_SSL_UNRECOGNIZED_NAME_ALERT)
|
|
**Symptoms**: SSL errors that work sometimes but fail other times, `ERR_SSL_UNRECOGNIZED_NAME_ALERT` in browser, connection works from internal network intermittently
|
|
|
|
**Root Cause**: IPv6/IPv4 DNS conflicts where public DNS returns Cloudflare IPv6 addresses while local DNS (Pi-hole) only overrides IPv4. Modern systems prefer IPv6, causing intermittent failures when IPv6 connection attempts fail.
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check for multiple DNS records (IPv4 + IPv6)
|
|
nslookup domain.example.com 10.10.0.16
|
|
dig domain.example.com @10.10.0.16
|
|
|
|
# Compare with public DNS
|
|
host domain.example.com 8.8.8.8
|
|
|
|
# Test IPv6 vs IPv4 connectivity
|
|
curl -6 -I https://domain.example.com # IPv6 (may fail)
|
|
curl -4 -I https://domain.example.com # IPv4 (should work)
|
|
|
|
# Check if system has IPv6 connectivity
|
|
ip -6 addr show | grep global
|
|
```
|
|
|
|
**Example Problem**:
|
|
```bash
|
|
# Local Pi-hole returns:
|
|
domain.example.com → 10.10.0.16 (IPv4 internal NPM)
|
|
|
|
# Public DNS also returns:
|
|
domain.example.com → 2606:4700:... (Cloudflare IPv6)
|
|
|
|
# System tries IPv6 first → fails
|
|
# Sometimes falls back to IPv4 → works
|
|
# Result: Intermittent SSL errors
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**Option 1: Add IPv6 Local DNS Override** (Recommended)
|
|
```bash
|
|
# Add non-routable IPv6 address to Pi-hole custom.list
|
|
ssh pihole "docker exec pihole bash -c 'echo \"fe80::1 domain.example.com\" >> /etc/pihole/custom.list'"
|
|
|
|
# Restart Pi-hole DNS
|
|
ssh pihole "docker exec pihole pihole restartdns"
|
|
|
|
# Verify fix
|
|
nslookup domain.example.com 10.10.0.16
|
|
# Should show: 10.10.0.16 (IPv4) and fe80::1 (IPv6 link-local)
|
|
```
|
|
|
|
**Option 2: Remove Cloudflare DNS Records** (If public access not needed)
|
|
```bash
|
|
# In Cloudflare dashboard:
|
|
# - Turn off orange cloud (proxy) for the domain
|
|
# - Or delete A/AAAA records entirely
|
|
|
|
# This removes Cloudflare IPs from public DNS
|
|
```
|
|
|
|
**Option 3: Disable IPv6 on Client** (Temporary testing)
|
|
```bash
|
|
# Disable IPv6 temporarily to confirm diagnosis
|
|
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
|
|
|
|
# Test domain - should work consistently now
|
|
|
|
# Re-enable when done testing
|
|
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
|
|
```
|
|
|
|
**Verification**:
|
|
```bash
|
|
# After applying fix, verify consistent resolution
|
|
for i in {1..10}; do
|
|
echo "Test $i:"
|
|
curl -I https://domain.example.com 2>&1 | grep -E "(HTTP|SSL|certificate)"
|
|
sleep 1
|
|
done
|
|
|
|
# All attempts should succeed consistently
|
|
```
|
|
|
|
**Real-World Example**: git.manticorum.com
|
|
- **Problem**: Intermittent SSL errors from internal network (10.0.0.0/24)
|
|
- **Diagnosis**: Pi-hole had IPv4 override (10.10.0.16) but public DNS returned Cloudflare IPv6
|
|
- **Solution**: Added `fe80::1 git.manticorum.com` to Pi-hole custom.list
|
|
- **Result**: Consistent successful connections, always routes to internal NPM
|
|
|
|
### iOS DNS Bypass Issues (Encrypted DNS)
|
|
**Symptoms**: iOS device gets 403 errors when accessing internal services, NPM logs show external public IP as source instead of local 10.x.x.x IP, even with correct Pi-hole DNS configuration
|
|
|
|
**Root Cause**: iOS devices can use encrypted DNS (DNS-over-HTTPS or DNS-over-TLS) that bypasses traditional DNS servers, even when correctly configured. This causes the device to resolve to public/Cloudflare IPs instead of local overrides, routing traffic through the public internet and triggering ACL denials.
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check NPM access logs for the service
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403"
|
|
|
|
# Look for external IPs in logs instead of local 10.x.x.x:
|
|
# BAD: [Client 73.36.102.55] - - 403 (external IP, blocked by ACL)
|
|
# GOOD: [Client 10.0.0.207] - 200 200 (local IP, allowed)
|
|
|
|
# Verify iOS device is on local network
|
|
# On iOS: Settings → Wi-Fi → (i) → IP Address
|
|
# Should show 10.0.0.x or 10.10.0.x
|
|
|
|
# Verify Pi-hole DNS is configured
|
|
# On iOS: Settings → Wi-Fi → (i) → DNS
|
|
# Should show 10.10.0.16
|
|
|
|
# Test if DNS is actually being used
|
|
nslookup domain.example.com 10.10.0.16 # Shows what Pi-hole returns
|
|
# Then check what iOS actually resolves (if possible via network sniffer)
|
|
```
|
|
|
|
**Example Problem**:
|
|
```bash
|
|
# iOS device configuration:
|
|
IP Address: 10.0.0.207 (correct, on local network)
|
|
DNS: 10.10.0.16 (correct, Pi-hole configured)
|
|
Cellular Data: OFF
|
|
|
|
# But NPM logs show:
|
|
[Client 73.36.102.55] - - 403 # Coming from ISP public IP!
|
|
|
|
# Why: iOS is using encrypted DNS, bypassing Pi-hole
|
|
# Result: Resolves to Cloudflare IP, routes through public internet,
|
|
# NPM sees external IP, ACL blocks with 403
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**Option 1: Add Public IP to NPM Access Rules** (Quickest, recommended for mobile devices)
|
|
```bash
|
|
# Find which config file contains your domain
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sh -c 'grep -l domain.example.com /data/nginx/proxy_host/*.conf'"
|
|
# Example output: /data/nginx/proxy_host/19.conf
|
|
|
|
# Add public IP to access rules (replace YOUR_PUBLIC_IP and config number)
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sed -i '/allow 10.10.0.0\/24;/a \ \n allow YOUR_PUBLIC_IP;' /data/nginx/proxy_host/19.conf"
|
|
|
|
# Verify the change
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 cat /data/nginx/proxy_host/19.conf" | grep -A 8 "Access Rules"
|
|
|
|
# Test and reload nginx
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -t"
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -s reload"
|
|
```
|
|
|
|
**Option 2: Reset iOS Network Settings** (Nuclear option, clears DNS cache/profiles)
|
|
```
|
|
iOS: Settings → General → Transfer or Reset iPhone → Reset → Reset Network Settings
|
|
WARNING: This removes all saved WiFi passwords and network configurations
|
|
```
|
|
|
|
**Option 3: Check for DNS Configuration Profiles**
|
|
```
|
|
iOS: Settings → General → VPN & Device Management
|
|
- Look for any DNS or Configuration Profiles
|
|
- Remove any third-party DNS profiles (AdGuard, NextDNS, etc.)
|
|
```
|
|
|
|
**Option 4: Disable Private Relay and IP Tracking** (Usually already tried)
|
|
```
|
|
iOS: Settings → [Your Name] → iCloud → Private Relay → OFF
|
|
iOS: Settings → Wi-Fi → (i) → Limit IP Address Tracking → OFF
|
|
```
|
|
|
|
**Option 5: Check Browser DNS Settings** (If using Brave or Firefox)
|
|
```
|
|
Brave: Settings → Brave Shields & Privacy → Use secure DNS → OFF
|
|
Firefox: Settings → DNS over HTTPS → OFF
|
|
```
|
|
|
|
**Verification**:
|
|
```bash
|
|
# After applying fix, check NPM logs while accessing from iOS
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log"
|
|
|
|
# With Option 1 (added public IP): Should see 200 status with external IP
|
|
# With Option 2-5 (fixed DNS): Should see 200 status with local 10.x.x.x IP
|
|
```
|
|
|
|
**Important Notes**:
|
|
- **Option 1 is recommended for mobile devices** as iOS encrypted DNS behavior is inconsistent
|
|
- Public IP workaround requires updating if ISP changes your IP (rare for residential)
|
|
- Manual nginx config changes (Option 1) will be **overwritten if you edit the proxy host in NPM UI**
|
|
- To make permanent, either use NPM UI to add the IP, or re-apply after UI changes
|
|
- This issue can affect any iOS device (iPhone, iPad) and some Android devices with encrypted DNS
|
|
|
|
**Real-World Example**: git.manticorum.com iOS Access
|
|
- **Problem**: iPhone showing 403 errors, desktop working fine on same network
|
|
- **iOS Config**: IP 10.0.0.207, DNS 10.10.0.16, Cellular OFF (all correct)
|
|
- **NPM Logs**: iPhone requests showing as [Client 73.36.102.55] (ISP public IP)
|
|
- **Diagnosis**: iOS using encrypted DNS, bypassing Pi-hole, routing through Cloudflare
|
|
- **Solution**: Added `allow 73.36.102.55;` to NPM proxy_host/19.conf ACL rules
|
|
- **Result**: Immediate access, user able to log in to Gitea successfully
|
|
|
|
## Network Storage Issues
|
|
|
|
### CIFS/SMB Mount Problems
|
|
**Symptoms**: Mount failures, connection timeouts, permission errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test SMB connectivity
|
|
smbclient -L //nas-server -U username
|
|
testparm # Test Samba configuration
|
|
|
|
# Check mount status
|
|
mount | grep cifs
|
|
df -h | grep cifs
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Remount with verbose logging
|
|
sudo mount -t cifs //server/share /mnt/point -o username=user,password=pass,vers=3.0
|
|
|
|
# Fix mount options in /etc/fstab
|
|
//server/share /mnt/point cifs credentials=/etc/cifs/credentials,uid=1000,gid=1000,iocharset=utf8,file_mode=0644,dir_mode=0755,cache=strict,_netdev 0 0
|
|
|
|
# Test credentials
|
|
sudo cat /etc/cifs/credentials
|
|
# Should contain: username=, password=, domain=
|
|
```
|
|
|
|
### NFS Mount Issues
|
|
**Symptoms**: Stale file handles, mount hangs, permission denied
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check NFS services
|
|
systemctl status nfs-client.target
|
|
showmount -e nfs-server
|
|
|
|
# Test NFS connectivity
|
|
rpcinfo -p nfs-server
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart NFS services
|
|
sudo systemctl restart nfs-client.target
|
|
|
|
# Remount NFS shares
|
|
sudo umount /mnt/nfs-share
|
|
sudo mount -t nfs server:/path /mnt/nfs-share
|
|
|
|
# Fix stale file handles
|
|
sudo umount -f /mnt/nfs-share
|
|
sudo mount /mnt/nfs-share
|
|
```
|
|
|
|
## Firewall and Security Issues
|
|
|
|
### Port Access Problems
|
|
**Symptoms**: Connection refused, filtered ports, blocked services
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check firewall status
|
|
sudo ufw status verbose
|
|
sudo iptables -L -n -v
|
|
|
|
# Test port accessibility
|
|
nc -zv host port
|
|
nmap -p port host
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Open required ports
|
|
sudo ufw allow ssh
|
|
sudo ufw allow 80/tcp
|
|
sudo ufw allow 443/tcp
|
|
sudo ufw allow from 10.10.0.0/24
|
|
|
|
# Reset firewall if needed
|
|
sudo ufw --force reset
|
|
sudo ufw enable
|
|
```
|
|
|
|
### Network Security Issues
|
|
**Symptoms**: Unauthorized access, suspicious traffic, security alerts
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check active connections
|
|
ss -tuln
|
|
netstat -tuln
|
|
|
|
# Review logs for security events
|
|
sudo tail -f /var/log/auth.log
|
|
sudo tail -f /var/log/syslog | grep -i security
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Block suspicious IPs
|
|
sudo ufw deny from suspicious-ip
|
|
|
|
# Update SSH security
|
|
sudo nano /etc/ssh/sshd_config
|
|
# Set: PasswordAuthentication no, PermitRootLogin no
|
|
sudo systemctl restart sshd
|
|
```
|
|
|
|
## Pi-hole High Availability Troubleshooting
|
|
|
|
### Pi-hole Not Responding to DNS Queries
|
|
**Symptoms**: DNS resolution failures, clients cannot resolve domains, Pi-hole web UI inaccessible
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test DNS response from both Pi-holes
|
|
dig @10.10.0.16 google.com
|
|
dig @10.10.0.226 google.com
|
|
|
|
# Check Pi-hole container status
|
|
ssh npm-pihole "docker ps | grep pihole"
|
|
ssh ubuntu-manticore "docker ps | grep pihole"
|
|
|
|
# Check Pi-hole logs
|
|
ssh npm-pihole "docker logs pihole --tail 50"
|
|
ssh ubuntu-manticore "docker logs pihole --tail 50"
|
|
|
|
# Test port 53 is listening
|
|
ssh ubuntu-manticore "netstat -tulpn | grep :53"
|
|
ssh ubuntu-manticore "ss -tulpn | grep :53"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart Pi-hole containers
|
|
ssh npm-pihole "docker restart pihole"
|
|
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose restart"
|
|
|
|
# Check for port conflicts
|
|
ssh ubuntu-manticore "lsof -i :53"
|
|
|
|
# If systemd-resolved is conflicting, disable it
|
|
ssh ubuntu-manticore "sudo systemctl stop systemd-resolved"
|
|
ssh ubuntu-manticore "sudo systemctl disable systemd-resolved"
|
|
|
|
# Rebuild Pi-hole container
|
|
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d"
|
|
```
|
|
|
|
### DNS Failover Not Working
|
|
**Symptoms**: DNS stops working when primary Pi-hole fails, clients not using secondary DNS
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check UniFi DHCP DNS configuration
|
|
# Via UniFi UI: Settings → Networks → LAN → DHCP
|
|
# DNS Server 1: 10.10.0.16
|
|
# DNS Server 2: 10.10.0.226
|
|
|
|
# Check client DNS configuration
|
|
# Windows:
|
|
ipconfig /all | findstr /i "DNS"
|
|
|
|
# Linux/macOS:
|
|
cat /etc/resolv.conf
|
|
|
|
# Check if secondary Pi-hole is reachable
|
|
ping -c 4 10.10.0.226
|
|
dig @10.10.0.226 google.com
|
|
|
|
# Test failover manually
|
|
ssh npm-pihole "docker stop pihole"
|
|
dig google.com # Should still work via secondary
|
|
ssh npm-pihole "docker start pihole"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Force DHCP lease renewal to get updated DNS servers
|
|
# Windows:
|
|
ipconfig /release && ipconfig /renew
|
|
|
|
# Linux:
|
|
sudo dhclient -r && sudo dhclient
|
|
|
|
# macOS/iOS:
|
|
# Disconnect and reconnect to WiFi
|
|
|
|
# Verify UniFi DHCP settings are correct
|
|
# Both DNS servers must be configured in UniFi controller
|
|
|
|
# Check client respects both DNS servers
|
|
# Some clients may cache failed DNS responses
|
|
# Flush DNS cache:
|
|
# Windows: ipconfig /flushdns
|
|
# macOS: sudo dscacheutil -flushcache
|
|
# Linux: sudo systemd-resolve --flush-caches
|
|
```
|
|
|
|
### Orbital Sync Not Syncing
|
|
**Symptoms**: Blocklists/whitelists differ between Pi-holes, custom DNS entries missing on secondary
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Orbital Sync container status
|
|
ssh ubuntu-manticore "docker ps | grep orbital-sync"
|
|
|
|
# Check Orbital Sync logs
|
|
ssh ubuntu-manticore "docker logs orbital-sync --tail 100"
|
|
|
|
# Look for sync errors in logs
|
|
ssh ubuntu-manticore "docker logs orbital-sync 2>&1 | grep -i error"
|
|
|
|
# Verify API tokens are correct
|
|
ssh ubuntu-manticore "cat ~/docker/orbital-sync/.env"
|
|
|
|
# Test API access manually
|
|
ssh npm-pihole "docker exec pihole pihole -a -p" # Get API token
|
|
curl -H "Authorization: Token YOUR_TOKEN" http://10.10.0.16/admin/api.php?status
|
|
|
|
# Compare blocklist counts between Pi-holes
|
|
ssh npm-pihole "docker exec pihole pihole -g -l"
|
|
ssh ubuntu-manticore "docker exec pihole pihole -g -l"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Regenerate API tokens
|
|
# Primary Pi-hole: http://10.10.0.16/admin → Settings → API → Generate New Token
|
|
# Secondary Pi-hole: http://10.10.0.226:8053/admin → Settings → API → Generate New Token
|
|
|
|
# Update Orbital Sync .env file
|
|
ssh ubuntu-manticore "nano ~/docker/orbital-sync/.env"
|
|
# Update PRIMARY_HOST_PASSWORD and SECONDARY_HOST_PASSWORD
|
|
|
|
# Restart Orbital Sync
|
|
ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose restart"
|
|
|
|
# Force immediate sync by restarting
|
|
ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose down && docker compose up -d"
|
|
|
|
# Monitor sync in real-time
|
|
ssh ubuntu-manticore "docker logs orbital-sync -f"
|
|
|
|
# If all else fails, manually sync via Teleporter
|
|
# Primary: Settings → Teleporter → Backup
|
|
# Secondary: Settings → Teleporter → Restore (upload backup file)
|
|
```
|
|
|
|
### NPM DNS Sync Failing
|
|
**Symptoms**: NPM proxy hosts missing from Pi-hole custom.list, new domains not resolving
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check NPM sync script status
|
|
ssh npm-pihole "cat /var/log/cron.log | grep npm-pihole-sync"
|
|
|
|
# Run sync script manually to see errors
|
|
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh"
|
|
|
|
# Check script can access both Pi-holes
|
|
ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
|
|
ssh npm-pihole "ssh ubuntu-manticore 'docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com'"
|
|
|
|
# Verify SSH connectivity to ubuntu-manticore
|
|
ssh npm-pihole "ssh ubuntu-manticore 'echo SSH OK'"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Fix SSH key authentication (if needed)
|
|
ssh npm-pihole "ssh-copy-id ubuntu-manticore"
|
|
|
|
# Test script with dry-run
|
|
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh --dry-run"
|
|
|
|
# Run script manually to sync immediately
|
|
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh"
|
|
|
|
# Verify cron job is configured
|
|
ssh npm-pihole "crontab -l | grep npm-pihole-sync"
|
|
|
|
# If cron job missing, add it
|
|
ssh npm-pihole "crontab -e"
|
|
# Add: 0 * * * * /home/cal/scripts/npm-pihole-sync.sh >> /var/log/npm-pihole-sync.log 2>&1
|
|
|
|
# Check script logs
|
|
ssh npm-pihole "tail -50 /var/log/npm-pihole-sync.log"
|
|
```
|
|
|
|
### Secondary Pi-hole Performance Issues
|
|
**Symptoms**: ubuntu-manticore slow, high CPU/RAM usage, Pi-hole affecting Jellyfin/Tdarr
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check resource usage
|
|
ssh ubuntu-manticore "docker stats --no-stream"
|
|
|
|
# Pi-hole should use <1% CPU and ~150MB RAM
|
|
# If higher, investigate:
|
|
ssh ubuntu-manticore "docker logs pihole --tail 100"
|
|
|
|
# Check for excessive queries
|
|
ssh ubuntu-manticore "docker exec pihole pihole -c -e"
|
|
|
|
# Check for DNS loops or misconfiguration
|
|
ssh ubuntu-manticore "docker exec pihole pihole -t" # Tail pihole.log
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart Pi-hole if resource usage is high
|
|
ssh ubuntu-manticore "docker restart pihole"
|
|
|
|
# Check for DNS query loops
|
|
# Look for same domain being queried repeatedly
|
|
ssh ubuntu-manticore "docker exec pihole pihole -t | grep -A 5 'query\[A\]'"
|
|
|
|
# Adjust Pi-hole cache settings if needed
|
|
ssh ubuntu-manticore "docker exec pihole bash -c 'echo \"cache-size=10000\" >> /etc/dnsmasq.d/99-custom.conf'"
|
|
ssh ubuntu-manticore "docker restart pihole"
|
|
|
|
# If Jellyfin/Tdarr are affected, verify Pi-hole is using minimal resources
|
|
# Resource limits can be added to docker-compose.yml:
|
|
ssh ubuntu-manticore "nano ~/docker/pihole/docker-compose.yml"
|
|
# Add under pihole service:
|
|
# deploy:
|
|
# resources:
|
|
# limits:
|
|
# cpus: '0.5'
|
|
# memory: 256M
|
|
```
|
|
|
|
### iOS Devices Still Getting 403 Errors (Post-HA Deployment)
|
|
**Symptoms**: After deploying dual Pi-hole setup, iOS devices still bypass DNS and get 403 errors on internal services
|
|
**Diagnosis**:
|
|
```bash
|
|
# Verify UniFi DHCP has BOTH Pi-holes configured, NO public DNS
|
|
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
|
|
# DNS1: 10.10.0.16
|
|
# DNS2: 10.10.0.226
|
|
# Public DNS (1.1.1.1, 8.8.8.8): REMOVED
|
|
|
|
# Check iOS DNS settings
|
|
# iOS: Settings → WiFi → (i) → DNS
|
|
# Should show: 10.10.0.16
|
|
|
|
# Force iOS DHCP renewal
|
|
# iOS: Settings → WiFi → Forget Network → Reconnect
|
|
|
|
# Check NPM logs for request source
|
|
ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403"
|
|
|
|
# Verify both Pi-holes have custom DNS entries
|
|
ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
|
|
ssh ubuntu-manticore "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Solution 1: Verify public DNS is removed from UniFi DHCP
|
|
# If public DNS (1.1.1.1) is still configured, iOS will prefer it
|
|
# Remove ALL public DNS servers from UniFi DHCP configuration
|
|
|
|
# Solution 2: Force iOS to renew DHCP lease
|
|
# iOS: Settings → WiFi → Forget Network
|
|
# Then reconnect to WiFi
|
|
# This forces device to get new DNS servers from DHCP
|
|
|
|
# Solution 3: Disable iOS encrypted DNS if still active
|
|
# iOS: Settings → [Your Name] → iCloud → Private Relay → OFF
|
|
# iOS: Check for DNS profiles: Settings → General → VPN & Device Management
|
|
|
|
# Solution 4: If encrypted DNS persists, add public IP to NPM ACL (fallback)
|
|
# See "iOS DNS Bypass Issues" section above for detailed steps
|
|
|
|
# Solution 5: Test with different iOS device to isolate issue
|
|
# If other iOS devices work, issue is device-specific configuration
|
|
|
|
# Verification after fix
|
|
ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log"
|
|
# Access git.manticorum.com from iOS
|
|
# Should see: [Client 10.0.0.x] - - 200 (local IP)
|
|
```
|
|
|
|
### Both Pi-holes Failing Simultaneously
|
|
**Symptoms**: Complete DNS failure across network, all devices cannot resolve domains
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check both Pi-hole containers
|
|
ssh npm-pihole "docker ps -a | grep pihole"
|
|
ssh ubuntu-manticore "docker ps -a | grep pihole"
|
|
|
|
# Check both hosts are reachable
|
|
ping -c 4 10.10.0.16
|
|
ping -c 4 10.10.0.226
|
|
|
|
# Check Docker daemon on both hosts
|
|
ssh npm-pihole "systemctl status docker"
|
|
ssh ubuntu-manticore "systemctl status docker"
|
|
|
|
# Test emergency DNS (bypassing Pi-hole)
|
|
dig @8.8.8.8 google.com
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Emergency: Temporarily use public DNS
|
|
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
|
|
# DNS1: 8.8.8.8 (Google DNS - temporary)
|
|
# DNS2: 1.1.1.1 (Cloudflare - temporary)
|
|
|
|
# Restart both Pi-holes
|
|
ssh npm-pihole "docker restart pihole"
|
|
ssh ubuntu-manticore "docker restart pihole"
|
|
|
|
# If Docker daemon issues:
|
|
ssh npm-pihole "sudo systemctl restart docker"
|
|
ssh ubuntu-manticore "sudo systemctl restart docker"
|
|
|
|
# Rebuild both Pi-holes if corruption suspected
|
|
ssh npm-pihole "cd ~/pihole && docker compose down && docker compose up -d"
|
|
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d"
|
|
|
|
# After Pi-holes are restored, revert UniFi DHCP to Pi-holes
|
|
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
|
|
# DNS1: 10.10.0.16
|
|
# DNS2: 10.10.0.226
|
|
```
|
|
|
|
### Query Load Not Balanced Between Pi-holes
|
|
**Symptoms**: Primary Pi-hole getting most queries, secondary rarely used
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check query counts on both Pi-holes
|
|
# Primary: http://10.10.0.16/admin → Dashboard → Total Queries
|
|
# Secondary: http://10.10.0.226:8053/admin → Dashboard → Total Queries
|
|
|
|
# This is NORMAL behavior - clients prefer DNS1 by default
|
|
# Secondary is for failover, not load balancing
|
|
|
|
# To verify failover works:
|
|
ssh npm-pihole "docker stop pihole"
|
|
# Wait 30 seconds
|
|
# Check secondary query count - should increase
|
|
ssh npm-pihole "docker start pihole"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# No action needed - this is expected behavior
|
|
# DNS failover is for redundancy, not load distribution
|
|
|
|
# If you want true load balancing (advanced):
|
|
# Option 1: Configure some devices to prefer DNS2
|
|
# Manually set DNS on specific devices to 10.10.0.226, 10.10.0.16
|
|
|
|
# Option 2: Implement DNS round-robin (requires custom DHCP)
|
|
# Not recommended for homelab - adds complexity
|
|
|
|
# Option 3: Accept default behavior (recommended)
|
|
# Primary handles most traffic, secondary provides failover
|
|
# This is industry standard DNS HA behavior
|
|
```
|
|
|
|
## Pi-hole Blocklist Blocking Legitimate Apps
|
|
|
|
### Facebook Blocklist Breaking Messenger Kids (2026-03-05)
|
|
**Symptoms**: iPad could not connect to Facebook Messenger Kids. App would not load or send/receive messages. Disconnecting iPad from WiFi (using cellular) restored functionality.
|
|
|
|
**Root Cause**: The `anudeepND/blacklist/master/facebook.txt` blocklist was subscribed in Pi-hole, which blocked all core Facebook domains needed by Messenger Kids.
|
|
|
|
**Blocked Domains (from pihole.log)**:
|
|
| Domain | Purpose |
|
|
|--------|---------|
|
|
| `edge-mqtt.facebook.com` | MQTT real-time message transport |
|
|
| `graph.facebook.com` | Facebook Graph API (login, contacts, profiles) |
|
|
| `graph-fallback.facebook.com` | Graph API fallback (blocked via CNAME chain) |
|
|
| `www.facebook.com` | Core Facebook domain |
|
|
|
|
**Allowed Domains** (not on the blocklist, resolved fine):
|
|
- `dgw.c10r.facebook.com` - Data gateway
|
|
- `mqtt.fallback.c10r.facebook.com` - MQTT fallback
|
|
- `chat-e2ee.c10r.facebook.com` - E2E encrypted chat
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Find blocked domains for a specific client IP
|
|
ssh pihole "docker exec pihole grep 'CLIENT_IP' /var/log/pihole/pihole.log | grep 'gravity blocked'"
|
|
|
|
# Check which blocklist contains a domain
|
|
ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com"
|
|
# Output: https://raw.githubusercontent.com/anudeepND/blacklist/master/facebook.txt (block)
|
|
```
|
|
|
|
**Resolution**: Removed the Facebook blocklist from primary Pi-hole (secondary didn't have it). The blocklist contained ~3,997 Facebook domains.
|
|
|
|
**Pi-hole v6 API - Deleting a Blocklist**:
|
|
```bash
|
|
# Authenticate and get session ID
|
|
SID=$(curl -s -X POST 'http://PIHOLE_IP:PORT/api/auth' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"password":"APP_PASSWORD"}' \
|
|
| python3 -c 'import sys,json; print(json.load(sys.stdin)["session"]["sid"])')
|
|
|
|
# DELETE uses the URL-encoded list ADDRESS as path parameter (NOT numeric ID)
|
|
# The ?type=block parameter is REQUIRED
|
|
curl -s -X DELETE \
|
|
"http://PIHOLE_IP:PORT/api/lists/URL_ENCODED_LIST_ADDRESS?type=block" \
|
|
-H "X-FTL-SID: $SID"
|
|
# Success returns HTTP 204 No Content
|
|
|
|
# Update gravity after removal
|
|
ssh pihole "docker exec pihole pihole -g"
|
|
|
|
# Verify domain is no longer blocked
|
|
ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com"
|
|
```
|
|
|
|
**Important Pi-hole v6 API Notes**:
|
|
- List endpoints use the URL-encoded blocklist address as path param, not numeric IDs
|
|
- `?type=block` query parameter is mandatory for DELETE operations
|
|
- Numeric ID DELETE returns 200 with `{"took": ...}` but DOES NOT actually delete (silent failure)
|
|
- Successful address-based DELETE returns HTTP 204 (no body)
|
|
- Must run `pihole -g` (gravity update) after deletion for changes to take effect
|
|
|
|
**Future Improvement (TODO)**: Implement Pi-hole v6 group/client-based approach:
|
|
- Create a group for the iPad that bypasses the Facebook blocklist
|
|
- Re-add the Facebook blocklist assigned to the default group only
|
|
- Assign the iPad's IP to a "Kids Devices" client group that excludes the Facebook list
|
|
- This would maintain Facebook blocking for other devices while allowing Messenger Kids
|
|
- See: Pi-hole v6 Admin -> Groups/Clients for per-device blocklist management
|
|
|
|
## Service Discovery and DNS Issues
|
|
|
|
### Local DNS Problems
|
|
**Symptoms**: Services unreachable by hostname, DNS timeouts
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test local DNS resolution
|
|
nslookup service.homelab.local
|
|
dig @10.10.0.16 service.homelab.local
|
|
|
|
# Check DNS server status
|
|
systemctl status bind9 # or named
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Add to /etc/hosts as temporary fix
|
|
echo "10.10.0.100 service.homelab.local" | sudo tee -a /etc/hosts
|
|
|
|
# Restart DNS services
|
|
sudo systemctl restart bind9
|
|
sudo systemctl restart systemd-resolved
|
|
```
|
|
|
|
### Container Networking Issues
|
|
**Symptoms**: Containers cannot communicate, service discovery fails
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Docker networks
|
|
docker network ls
|
|
docker network inspect bridge
|
|
|
|
# Test container connectivity
|
|
docker exec container1 ping container2
|
|
docker exec container1 nslookup container2
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Create custom network
|
|
docker network create --driver bridge app-network
|
|
docker run --network app-network container
|
|
|
|
# Fix DNS in containers
|
|
docker run --dns 8.8.8.8 container
|
|
```
|
|
|
|
## Performance Issues
|
|
|
|
### Network Latency Problems
|
|
**Symptoms**: Slow response times, timeouts, poor performance
|
|
**Diagnosis**:
|
|
```bash
|
|
# Measure network latency
|
|
ping -c 100 host
|
|
mtr --report host
|
|
|
|
# Check network interface stats
|
|
ip -s link show
|
|
cat /proc/net/dev
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Optimize network settings
|
|
echo 'net.core.rmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
|
|
echo 'net.core.wmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
|
|
sudo sysctl -p
|
|
|
|
# Check for network congestion
|
|
iftop
|
|
nethogs
|
|
```
|
|
|
|
### Bandwidth Issues
|
|
**Symptoms**: Slow transfers, network congestion, dropped packets
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test bandwidth
|
|
iperf3 -s # Server
|
|
iperf3 -c server-ip # Client
|
|
|
|
# Check interface utilization
|
|
vnstat -i eth0
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Implement QoS if needed
|
|
sudo tc qdisc add dev eth0 root fq_codel
|
|
|
|
# Optimize buffer sizes
|
|
sudo ethtool -G eth0 rx 4096 tx 4096
|
|
```
|
|
|
|
## Emergency Recovery Procedures
|
|
|
|
### Network Emergency Recovery
|
|
**Complete network failure recovery**:
|
|
```bash
|
|
# Reset all network configuration
|
|
sudo systemctl stop networking
|
|
sudo ip addr flush eth0
|
|
sudo ip route flush table main
|
|
sudo systemctl start networking
|
|
|
|
# Manual network configuration
|
|
sudo ip addr add 10.10.0.100/24 dev eth0
|
|
sudo ip route add default via 10.10.0.1
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
```
|
|
|
|
### SSH Emergency Access
|
|
**When locked out of systems**:
|
|
```bash
|
|
# Use emergency SSH key
|
|
ssh -i ~/.ssh/emergency_homelab_rsa user@host
|
|
|
|
# Via console access (if available)
|
|
# Use hypervisor console or physical access
|
|
|
|
# Reset SSH to allow password auth temporarily
|
|
sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
|
|
sudo systemctl restart sshd
|
|
```
|
|
|
|
### Service Recovery
|
|
**Critical service restoration**:
|
|
```bash
|
|
# Restart all network services
|
|
sudo systemctl restart networking
|
|
sudo systemctl restart nginx
|
|
sudo systemctl restart sshd
|
|
|
|
# Emergency firewall disable
|
|
sudo ufw disable # CAUTION: Only for troubleshooting
|
|
|
|
# Service-specific recovery
|
|
sudo systemctl restart docker
|
|
sudo systemctl restart systemd-resolved
|
|
```
|
|
|
|
## Monitoring and Prevention
|
|
|
|
### Network Health Monitoring
|
|
```bash
|
|
#!/bin/bash
|
|
# network-monitor.sh
|
|
CRITICAL_HOSTS="10.10.0.1 10.10.0.16 nas.homelab.local"
|
|
CRITICAL_SERVICES="https://homelab.local http://proxmox.homelab.local:8006"
|
|
|
|
for host in $CRITICAL_HOSTS; do
|
|
if ! ping -c1 -W5 $host >/dev/null 2>&1; then
|
|
echo "ALERT: $host unreachable" | logger -t network-monitor
|
|
fi
|
|
done
|
|
|
|
for service in $CRITICAL_SERVICES; do
|
|
if ! curl -sSf --max-time 10 "$service" >/dev/null 2>&1; then
|
|
echo "ALERT: $service unavailable" | logger -t network-monitor
|
|
fi
|
|
done
|
|
```
|
|
|
|
### Automated Recovery Scripts
|
|
```bash
|
|
#!/bin/bash
|
|
# network-recovery.sh
|
|
if ! ping -c1 8.8.8.8 >/dev/null 2>&1; then
|
|
echo "Network down, attempting recovery..."
|
|
sudo systemctl restart networking
|
|
sleep 10
|
|
if ping -c1 8.8.8.8 >/dev/null 2>&1; then
|
|
echo "Network recovered"
|
|
else
|
|
echo "Manual intervention required"
|
|
fi
|
|
fi
|
|
```
|
|
|
|
## Quick Reference Commands
|
|
|
|
### Network Diagnostics
|
|
```bash
|
|
# Connectivity tests
|
|
ping host
|
|
traceroute host
|
|
mtr host
|
|
nc -zv host port
|
|
|
|
# Service checks
|
|
systemctl status networking
|
|
systemctl status nginx
|
|
systemctl status sshd
|
|
|
|
# Network configuration
|
|
ip addr show
|
|
ip route show
|
|
ss -tuln
|
|
```
|
|
|
|
### Emergency Commands
|
|
```bash
|
|
# Network restart
|
|
sudo systemctl restart networking
|
|
|
|
# SSH emergency access
|
|
ssh -i ~/.ssh/emergency_homelab_rsa user@host
|
|
|
|
# Firewall quick disable (emergency only)
|
|
sudo ufw disable
|
|
|
|
# DNS quick fix
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
```
|
|
|
|
This troubleshooting guide provides comprehensive solutions for common networking issues in home lab environments. |