All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1222 lines
36 KiB
Markdown
1222 lines
36 KiB
Markdown
---
|
|
title: "Networking Troubleshooting Guide"
|
|
description: "Comprehensive troubleshooting for SSH, DNS, reverse proxy, SSL, CIFS/NFS mounts, Pi-hole HA, iOS DNS bypass, UniFi firewall rules, and emergency recovery procedures."
|
|
type: troubleshooting
|
|
domain: networking
|
|
tags: [ssh, dns, pihole, ssl, cifs, nfs, firewall, unifi, ios, nginx, troubleshooting]
|
|
---
|
|
|
|
# Networking Infrastructure Troubleshooting Guide
|
|
|
|
## SSH Connection Issues
|
|
|
|
### SSH Authentication Failures
|
|
**Symptoms**: Permission denied, connection refused, timeout
|
|
**Diagnosis**:
|
|
```bash
|
|
# Verbose SSH debugging
|
|
ssh -vvv user@host
|
|
|
|
# Test different authentication methods
|
|
ssh -o PasswordAuthentication=no user@host
|
|
ssh -o PubkeyAuthentication=yes user@host
|
|
|
|
# Check local key files
|
|
ls -la ~/.ssh/
|
|
ssh-keygen -lf ~/.ssh/homelab_rsa.pub
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Re-deploy SSH keys
|
|
ssh-copy-id -i ~/.ssh/homelab_rsa.pub user@host
|
|
ssh-copy-id -i ~/.ssh/emergency_homelab_rsa.pub user@host
|
|
|
|
# Fix key permissions
|
|
chmod 600 ~/.ssh/homelab_rsa
|
|
chmod 644 ~/.ssh/homelab_rsa.pub
|
|
chmod 700 ~/.ssh
|
|
|
|
# Verify remote authorized_keys
|
|
ssh user@host 'chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys'
|
|
```
|
|
|
|
### SSH Service Issues
|
|
**Symptoms**: Connection refused, service not running
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check SSH service status
|
|
systemctl status sshd
|
|
ss -tlnp | grep :22
|
|
|
|
# Test port connectivity
|
|
nc -zv host 22
|
|
nmap -p 22 host
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart SSH service
|
|
sudo systemctl restart sshd
|
|
sudo systemctl enable sshd
|
|
|
|
# Check firewall
|
|
sudo ufw status
|
|
sudo ufw allow ssh
|
|
|
|
# Verify SSH configuration
|
|
sudo sshd -T | grep -E "(passwordauth|pubkeyauth|permitroot)"
|
|
```
|
|
|
|
## Network Connectivity Problems
|
|
|
|
### Basic Network Troubleshooting
|
|
**Symptoms**: Cannot reach hosts, timeouts, routing issues
|
|
**Diagnosis**:
|
|
```bash
|
|
# Basic connectivity tests
|
|
ping host
|
|
traceroute host
|
|
mtr host
|
|
|
|
# Check local network configuration
|
|
ip addr show
|
|
ip route show
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart networking
|
|
sudo systemctl restart networking
|
|
sudo netplan apply # Ubuntu
|
|
|
|
# Reset network interface
|
|
sudo ip link set eth0 down
|
|
sudo ip link set eth0 up
|
|
|
|
# Check default gateway
|
|
sudo ip route add default via 10.10.0.1
|
|
```
|
|
|
|
### DNS Resolution Issues
|
|
**Symptoms**: Cannot resolve hostnames, slow resolution
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test DNS resolution
|
|
nslookup google.com
|
|
dig google.com
|
|
host google.com
|
|
|
|
# Check DNS servers
|
|
systemd-resolve --status
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Temporary DNS fix
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
|
|
# Restart DNS services
|
|
sudo systemctl restart systemd-resolved
|
|
|
|
# Flush DNS cache
|
|
sudo systemd-resolve --flush-caches
|
|
```
|
|
|
|
### UniFi Firewall Blocking DNS to New Networks
|
|
**Symptoms**: New network/VLAN has "no internet access" - devices connect to WiFi but cannot browse or resolve domain names. Ping to IP addresses (8.8.8.8) works, but DNS resolution fails.
|
|
|
|
**Root Cause**: Firewall rules blocking traffic from DNS servers (Pi-holes in "Servers" network group) to new networks. Rules like "Servers to WiFi" or "Servers to Home" with DROP action block ALL traffic including DNS responses on port 53.
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# From affected device on new network:
|
|
|
|
# Test if routing works (should succeed)
|
|
ping 8.8.8.8
|
|
traceroute 8.8.8.8
|
|
|
|
# Test if DNS resolution works (will fail)
|
|
nslookup google.com
|
|
|
|
# Test DNS servers directly (will timeout or fail)
|
|
nslookup google.com 10.10.0.16
|
|
nslookup google.com 10.10.0.226
|
|
|
|
# Test public DNS (should work)
|
|
nslookup google.com 8.8.8.8
|
|
|
|
# Check DHCP-assigned DNS servers
|
|
# Windows:
|
|
ipconfig /all | findstr DNS
|
|
|
|
# Linux/macOS:
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**If routing works but DNS fails**, the issue is firewall blocking DNS traffic, not network configuration.
|
|
|
|
**Solutions**:
|
|
|
|
**Step 1: Identify Blocking Rules**
|
|
- In UniFi: Settings → Firewall & Security → Traffic Rules → LAN In
|
|
- Look for DROP rules with:
|
|
- Source: Servers (or network group containing Pi-holes)
|
|
- Destination: Your new network (e.g., "Home WiFi", "Home Network")
|
|
- Examples: "Servers to WiFi", "Servers to Home"
|
|
|
|
**Step 2: Create DNS Allow Rules (BEFORE Drop Rules)**
|
|
|
|
Create new rules positioned ABOVE the drop rules:
|
|
|
|
```
|
|
Name: Allow DNS - Servers to [Network Name]
|
|
Action: Accept
|
|
Rule Applied: Before Predefined Rules
|
|
Type: LAN In
|
|
Protocol: TCP and UDP
|
|
Source:
|
|
- Network/Group: Servers (or specific Pi-hole IPs: 10.10.0.16, 10.10.0.226)
|
|
- Port: Any
|
|
Destination:
|
|
- Network: [Your new network - e.g., Home WiFi]
|
|
- Port: 53 (DNS)
|
|
```
|
|
|
|
Repeat for each network that needs DNS access from servers.
|
|
|
|
**Step 3: Verify Rule Order**
|
|
|
|
**CRITICAL**: Firewall rules process top-to-bottom, first match wins!
|
|
|
|
Correct order:
|
|
```
|
|
✅ Allow DNS - Servers to Home Network (Accept, Port 53)
|
|
✅ Allow DNS - Servers to Home WiFi (Accept, Port 53)
|
|
❌ Servers to Home (Drop, All ports)
|
|
❌ Servers to WiFi (Drop, All ports)
|
|
```
|
|
|
|
**Step 4: Re-enable Drop Rules**
|
|
|
|
Once DNS allow rules are in place and positioned correctly, re-enable the drop rules.
|
|
|
|
**Verification**:
|
|
```bash
|
|
# From device on new network:
|
|
|
|
# DNS should work
|
|
nslookup google.com
|
|
|
|
# Browsing should work
|
|
ping google.com
|
|
|
|
# Other server traffic should still be blocked (expected)
|
|
ping 10.10.0.16 # Should fail or timeout
|
|
ssh 10.10.0.16 # Should be blocked
|
|
```
|
|
|
|
**Real-World Example**: New "Home WiFi" network (10.1.0.0/24, VLAN 2)
|
|
- **Problem**: Devices connected but couldn't browse web
|
|
- **Diagnosis**: `traceroute 8.8.8.8` worked (16ms), but `nslookup google.com` failed
|
|
- **Cause**: Firewall rule "Servers to WiFi" (rule 20004) blocked Pi-hole DNS responses
|
|
- **Solution**: Added "Allow DNS - Servers to Home WiFi" rule (Accept, port 53) above drop rule
|
|
- **Result**: DNS resolution works, other server traffic remains properly blocked
|
|
|
|
## Reverse Proxy and Load Balancer Issues
|
|
|
|
### Nginx Configuration Problems
|
|
**Symptoms**: 502 Bad Gateway, 503 Service Unavailable, SSL errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Nginx status and logs
|
|
systemctl status nginx
|
|
sudo tail -f /var/log/nginx/error.log
|
|
sudo tail -f /var/log/nginx/access.log
|
|
|
|
# Test Nginx configuration
|
|
sudo nginx -t
|
|
sudo nginx -T # Show full configuration
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Reload Nginx configuration
|
|
sudo nginx -s reload
|
|
|
|
# Check upstream servers
|
|
curl -I http://backend-server:port
|
|
telnet backend-server port
|
|
|
|
# Fix common configuration issues
|
|
sudo nano /etc/nginx/sites-available/default
|
|
# Check proxy_pass URLs, upstream definitions
|
|
```
|
|
|
|
### SSL/TLS Certificate Issues
|
|
**Symptoms**: Certificate warnings, expired certificates, connection errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check certificate validity
|
|
openssl s_client -connect host:443 -servername host
|
|
openssl x509 -in /etc/ssl/certs/cert.pem -text -noout
|
|
|
|
# Check certificate expiry
|
|
openssl x509 -in /etc/ssl/certs/cert.pem -noout -dates
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Renew Let's Encrypt certificates
|
|
sudo certbot renew --dry-run
|
|
sudo certbot renew --force-renewal
|
|
|
|
# Generate self-signed certificate
|
|
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
|
|
-keyout /etc/ssl/private/selfsigned.key \
|
|
-out /etc/ssl/certs/selfsigned.crt
|
|
```
|
|
|
|
### Intermittent SSL Errors (ERR_SSL_UNRECOGNIZED_NAME_ALERT)
|
|
**Symptoms**: SSL errors that work sometimes but fail other times, `ERR_SSL_UNRECOGNIZED_NAME_ALERT` in browser, connection works from internal network intermittently
|
|
|
|
**Root Cause**: IPv6/IPv4 DNS conflicts where public DNS returns Cloudflare IPv6 addresses while local DNS (Pi-hole) only overrides IPv4. Modern systems prefer IPv6, causing intermittent failures when IPv6 connection attempts fail.
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check for multiple DNS records (IPv4 + IPv6)
|
|
nslookup domain.example.com 10.10.0.16
|
|
dig domain.example.com @10.10.0.16
|
|
|
|
# Compare with public DNS
|
|
host domain.example.com 8.8.8.8
|
|
|
|
# Test IPv6 vs IPv4 connectivity
|
|
curl -6 -I https://domain.example.com # IPv6 (may fail)
|
|
curl -4 -I https://domain.example.com # IPv4 (should work)
|
|
|
|
# Check if system has IPv6 connectivity
|
|
ip -6 addr show | grep global
|
|
```
|
|
|
|
**Example Problem**:
|
|
```bash
|
|
# Local Pi-hole returns:
|
|
domain.example.com → 10.10.0.16 (IPv4 internal NPM)
|
|
|
|
# Public DNS also returns:
|
|
domain.example.com → 2606:4700:... (Cloudflare IPv6)
|
|
|
|
# System tries IPv6 first → fails
|
|
# Sometimes falls back to IPv4 → works
|
|
# Result: Intermittent SSL errors
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**Option 1: Add IPv6 Local DNS Override** (Recommended)
|
|
```bash
|
|
# Add non-routable IPv6 address to Pi-hole custom.list
|
|
ssh pihole "docker exec pihole bash -c 'echo \"fe80::1 domain.example.com\" >> /etc/pihole/custom.list'"
|
|
|
|
# Restart Pi-hole DNS
|
|
ssh pihole "docker exec pihole pihole restartdns"
|
|
|
|
# Verify fix
|
|
nslookup domain.example.com 10.10.0.16
|
|
# Should show: 10.10.0.16 (IPv4) and fe80::1 (IPv6 link-local)
|
|
```
|
|
|
|
**Option 2: Remove Cloudflare DNS Records** (If public access not needed)
|
|
```bash
|
|
# In Cloudflare dashboard:
|
|
# - Turn off orange cloud (proxy) for the domain
|
|
# - Or delete A/AAAA records entirely
|
|
|
|
# This removes Cloudflare IPs from public DNS
|
|
```
|
|
|
|
**Option 3: Disable IPv6 on Client** (Temporary testing)
|
|
```bash
|
|
# Disable IPv6 temporarily to confirm diagnosis
|
|
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
|
|
|
|
# Test domain - should work consistently now
|
|
|
|
# Re-enable when done testing
|
|
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
|
|
```
|
|
|
|
**Verification**:
|
|
```bash
|
|
# After applying fix, verify consistent resolution
|
|
for i in {1..10}; do
|
|
echo "Test $i:"
|
|
curl -I https://domain.example.com 2>&1 | grep -E "(HTTP|SSL|certificate)"
|
|
sleep 1
|
|
done
|
|
|
|
# All attempts should succeed consistently
|
|
```
|
|
|
|
**Real-World Example**: git.manticorum.com
|
|
- **Problem**: Intermittent SSL errors from internal network (10.0.0.0/24)
|
|
- **Diagnosis**: Pi-hole had IPv4 override (10.10.0.16) but public DNS returned Cloudflare IPv6
|
|
- **Solution**: Added `fe80::1 git.manticorum.com` to Pi-hole custom.list
|
|
- **Result**: Consistent successful connections, always routes to internal NPM
|
|
|
|
### iOS DNS Bypass Issues (Encrypted DNS)
|
|
**Symptoms**: iOS device gets 403 errors when accessing internal services, NPM logs show external public IP as source instead of local 10.x.x.x IP, even with correct Pi-hole DNS configuration
|
|
|
|
**Root Cause**: iOS devices can use encrypted DNS (DNS-over-HTTPS or DNS-over-TLS) that bypasses traditional DNS servers, even when correctly configured. This causes the device to resolve to public/Cloudflare IPs instead of local overrides, routing traffic through the public internet and triggering ACL denials.
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check NPM access logs for the service
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403"
|
|
|
|
# Look for external IPs in logs instead of local 10.x.x.x:
|
|
# BAD: [Client 73.36.102.55] - - 403 (external IP, blocked by ACL)
|
|
# GOOD: [Client 10.0.0.207] - 200 200 (local IP, allowed)
|
|
|
|
# Verify iOS device is on local network
|
|
# On iOS: Settings → Wi-Fi → (i) → IP Address
|
|
# Should show 10.0.0.x or 10.10.0.x
|
|
|
|
# Verify Pi-hole DNS is configured
|
|
# On iOS: Settings → Wi-Fi → (i) → DNS
|
|
# Should show 10.10.0.16
|
|
|
|
# Test if DNS is actually being used
|
|
nslookup domain.example.com 10.10.0.16 # Shows what Pi-hole returns
|
|
# Then check what iOS actually resolves (if possible via network sniffer)
|
|
```
|
|
|
|
**Example Problem**:
|
|
```bash
|
|
# iOS device configuration:
|
|
IP Address: 10.0.0.207 (correct, on local network)
|
|
DNS: 10.10.0.16 (correct, Pi-hole configured)
|
|
Cellular Data: OFF
|
|
|
|
# But NPM logs show:
|
|
[Client 73.36.102.55] - - 403 # Coming from ISP public IP!
|
|
|
|
# Why: iOS is using encrypted DNS, bypassing Pi-hole
|
|
# Result: Resolves to Cloudflare IP, routes through public internet,
|
|
# NPM sees external IP, ACL blocks with 403
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**Option 1: Add Public IP to NPM Access Rules** (Quickest, recommended for mobile devices)
|
|
```bash
|
|
# Find which config file contains your domain
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sh -c 'grep -l domain.example.com /data/nginx/proxy_host/*.conf'"
|
|
# Example output: /data/nginx/proxy_host/19.conf
|
|
|
|
# Add public IP to access rules (replace YOUR_PUBLIC_IP and config number)
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 sed -i '/allow 10.10.0.0\/24;/a \ \n allow YOUR_PUBLIC_IP;' /data/nginx/proxy_host/19.conf"
|
|
|
|
# Verify the change
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 cat /data/nginx/proxy_host/19.conf" | grep -A 8 "Access Rules"
|
|
|
|
# Test and reload nginx
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -t"
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 nginx -s reload"
|
|
```
|
|
|
|
**Option 2: Reset iOS Network Settings** (Nuclear option, clears DNS cache/profiles)
|
|
```
|
|
iOS: Settings → General → Transfer or Reset iPhone → Reset → Reset Network Settings
|
|
WARNING: This removes all saved WiFi passwords and network configurations
|
|
```
|
|
|
|
**Option 3: Check for DNS Configuration Profiles**
|
|
```
|
|
iOS: Settings → General → VPN & Device Management
|
|
- Look for any DNS or Configuration Profiles
|
|
- Remove any third-party DNS profiles (AdGuard, NextDNS, etc.)
|
|
```
|
|
|
|
**Option 4: Disable Private Relay and IP Tracking** (Usually already tried)
|
|
```
|
|
iOS: Settings → [Your Name] → iCloud → Private Relay → OFF
|
|
iOS: Settings → Wi-Fi → (i) → Limit IP Address Tracking → OFF
|
|
```
|
|
|
|
**Option 5: Check Browser DNS Settings** (If using Brave or Firefox)
|
|
```
|
|
Brave: Settings → Brave Shields & Privacy → Use secure DNS → OFF
|
|
Firefox: Settings → DNS over HTTPS → OFF
|
|
```
|
|
|
|
**Verification**:
|
|
```bash
|
|
# After applying fix, check NPM logs while accessing from iOS
|
|
ssh 10.10.0.16 "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log"
|
|
|
|
# With Option 1 (added public IP): Should see 200 status with external IP
|
|
# With Option 2-5 (fixed DNS): Should see 200 status with local 10.x.x.x IP
|
|
```
|
|
|
|
**Important Notes**:
|
|
- **Option 1 is recommended for mobile devices** as iOS encrypted DNS behavior is inconsistent
|
|
- Public IP workaround requires updating if ISP changes your IP (rare for residential)
|
|
- Manual nginx config changes (Option 1) will be **overwritten if you edit the proxy host in NPM UI**
|
|
- To make permanent, either use NPM UI to add the IP, or re-apply after UI changes
|
|
- This issue can affect any iOS device (iPhone, iPad) and some Android devices with encrypted DNS
|
|
|
|
**Real-World Example**: git.manticorum.com iOS Access
|
|
- **Problem**: iPhone showing 403 errors, desktop working fine on same network
|
|
- **iOS Config**: IP 10.0.0.207, DNS 10.10.0.16, Cellular OFF (all correct)
|
|
- **NPM Logs**: iPhone requests showing as [Client 73.36.102.55] (ISP public IP)
|
|
- **Diagnosis**: iOS using encrypted DNS, bypassing Pi-hole, routing through Cloudflare
|
|
- **Solution**: Added `allow 73.36.102.55;` to NPM proxy_host/19.conf ACL rules
|
|
- **Result**: Immediate access, user able to log in to Gitea successfully
|
|
|
|
## Network Storage Issues
|
|
|
|
### CIFS/SMB Mount Problems
|
|
**Symptoms**: Mount failures, connection timeouts, permission errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test SMB connectivity
|
|
smbclient -L //nas-server -U username
|
|
testparm # Test Samba configuration
|
|
|
|
# Check mount status
|
|
mount | grep cifs
|
|
df -h | grep cifs
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Remount with verbose logging
|
|
sudo mount -t cifs //server/share /mnt/point -o username=user,password=pass,vers=3.0
|
|
|
|
# Fix mount options in /etc/fstab
|
|
//server/share /mnt/point cifs credentials=/etc/cifs/credentials,uid=1000,gid=1000,iocharset=utf8,file_mode=0644,dir_mode=0755,cache=strict,_netdev 0 0
|
|
|
|
# Test credentials
|
|
sudo cat /etc/cifs/credentials
|
|
# Should contain: username=, password=, domain=
|
|
```
|
|
|
|
### NFS Mount Issues
|
|
**Symptoms**: Stale file handles, mount hangs, permission denied
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check NFS services
|
|
systemctl status nfs-client.target
|
|
showmount -e nfs-server
|
|
|
|
# Test NFS connectivity
|
|
rpcinfo -p nfs-server
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart NFS services
|
|
sudo systemctl restart nfs-client.target
|
|
|
|
# Remount NFS shares
|
|
sudo umount /mnt/nfs-share
|
|
sudo mount -t nfs server:/path /mnt/nfs-share
|
|
|
|
# Fix stale file handles
|
|
sudo umount -f /mnt/nfs-share
|
|
sudo mount /mnt/nfs-share
|
|
```
|
|
|
|
## Firewall and Security Issues
|
|
|
|
### Port Access Problems
|
|
**Symptoms**: Connection refused, filtered ports, blocked services
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check firewall status
|
|
sudo ufw status verbose
|
|
sudo iptables -L -n -v
|
|
|
|
# Test port accessibility
|
|
nc -zv host port
|
|
nmap -p port host
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Open required ports
|
|
sudo ufw allow ssh
|
|
sudo ufw allow 80/tcp
|
|
sudo ufw allow 443/tcp
|
|
sudo ufw allow from 10.10.0.0/24
|
|
|
|
# Reset firewall if needed
|
|
sudo ufw --force reset
|
|
sudo ufw enable
|
|
```
|
|
|
|
### Network Security Issues
|
|
**Symptoms**: Unauthorized access, suspicious traffic, security alerts
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check active connections
|
|
ss -tuln
|
|
netstat -tuln
|
|
|
|
# Review logs for security events
|
|
sudo tail -f /var/log/auth.log
|
|
sudo tail -f /var/log/syslog | grep -i security
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Block suspicious IPs
|
|
sudo ufw deny from suspicious-ip
|
|
|
|
# Update SSH security
|
|
sudo nano /etc/ssh/sshd_config
|
|
# Set: PasswordAuthentication no, PermitRootLogin no
|
|
sudo systemctl restart sshd
|
|
```
|
|
|
|
## Pi-hole High Availability Troubleshooting
|
|
|
|
### Pi-hole Not Responding to DNS Queries
|
|
**Symptoms**: DNS resolution failures, clients cannot resolve domains, Pi-hole web UI inaccessible
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test DNS response from both Pi-holes
|
|
dig @10.10.0.16 google.com
|
|
dig @10.10.0.226 google.com
|
|
|
|
# Check Pi-hole container status
|
|
ssh npm-pihole "docker ps | grep pihole"
|
|
ssh ubuntu-manticore "docker ps | grep pihole"
|
|
|
|
# Check Pi-hole logs
|
|
ssh npm-pihole "docker logs pihole --tail 50"
|
|
ssh ubuntu-manticore "docker logs pihole --tail 50"
|
|
|
|
# Test port 53 is listening
|
|
ssh ubuntu-manticore "netstat -tulpn | grep :53"
|
|
ssh ubuntu-manticore "ss -tulpn | grep :53"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart Pi-hole containers
|
|
ssh npm-pihole "docker restart pihole"
|
|
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose restart"
|
|
|
|
# Check for port conflicts
|
|
ssh ubuntu-manticore "lsof -i :53"
|
|
|
|
# If systemd-resolved is conflicting, disable it
|
|
ssh ubuntu-manticore "sudo systemctl stop systemd-resolved"
|
|
ssh ubuntu-manticore "sudo systemctl disable systemd-resolved"
|
|
|
|
# Rebuild Pi-hole container
|
|
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d"
|
|
```
|
|
|
|
### DNS Failover Not Working
|
|
**Symptoms**: DNS stops working when primary Pi-hole fails, clients not using secondary DNS
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check UniFi DHCP DNS configuration
|
|
# Via UniFi UI: Settings → Networks → LAN → DHCP
|
|
# DNS Server 1: 10.10.0.16
|
|
# DNS Server 2: 10.10.0.226
|
|
|
|
# Check client DNS configuration
|
|
# Windows:
|
|
ipconfig /all | findstr /i "DNS"
|
|
|
|
# Linux/macOS:
|
|
cat /etc/resolv.conf
|
|
|
|
# Check if secondary Pi-hole is reachable
|
|
ping -c 4 10.10.0.226
|
|
dig @10.10.0.226 google.com
|
|
|
|
# Test failover manually
|
|
ssh npm-pihole "docker stop pihole"
|
|
dig google.com # Should still work via secondary
|
|
ssh npm-pihole "docker start pihole"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Force DHCP lease renewal to get updated DNS servers
|
|
# Windows:
|
|
ipconfig /release && ipconfig /renew
|
|
|
|
# Linux:
|
|
sudo dhclient -r && sudo dhclient
|
|
|
|
# macOS/iOS:
|
|
# Disconnect and reconnect to WiFi
|
|
|
|
# Verify UniFi DHCP settings are correct
|
|
# Both DNS servers must be configured in UniFi controller
|
|
|
|
# Check client respects both DNS servers
|
|
# Some clients may cache failed DNS responses
|
|
# Flush DNS cache:
|
|
# Windows: ipconfig /flushdns
|
|
# macOS: sudo dscacheutil -flushcache
|
|
# Linux: sudo systemd-resolve --flush-caches
|
|
```
|
|
|
|
### Orbital Sync Not Syncing
|
|
**Symptoms**: Blocklists/whitelists differ between Pi-holes, custom DNS entries missing on secondary
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Orbital Sync container status
|
|
ssh ubuntu-manticore "docker ps | grep orbital-sync"
|
|
|
|
# Check Orbital Sync logs
|
|
ssh ubuntu-manticore "docker logs orbital-sync --tail 100"
|
|
|
|
# Look for sync errors in logs
|
|
ssh ubuntu-manticore "docker logs orbital-sync 2>&1 | grep -i error"
|
|
|
|
# Verify API tokens are correct
|
|
ssh ubuntu-manticore "cat ~/docker/orbital-sync/.env"
|
|
|
|
# Test API access manually
|
|
ssh npm-pihole "docker exec pihole pihole -a -p" # Get API token
|
|
curl -H "Authorization: Token YOUR_TOKEN" http://10.10.0.16/admin/api.php?status
|
|
|
|
# Compare blocklist counts between Pi-holes
|
|
ssh npm-pihole "docker exec pihole pihole -g -l"
|
|
ssh ubuntu-manticore "docker exec pihole pihole -g -l"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Regenerate API tokens
|
|
# Primary Pi-hole: http://10.10.0.16/admin → Settings → API → Generate New Token
|
|
# Secondary Pi-hole: http://10.10.0.226:8053/admin → Settings → API → Generate New Token
|
|
|
|
# Update Orbital Sync .env file
|
|
ssh ubuntu-manticore "nano ~/docker/orbital-sync/.env"
|
|
# Update PRIMARY_HOST_PASSWORD and SECONDARY_HOST_PASSWORD
|
|
|
|
# Restart Orbital Sync
|
|
ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose restart"
|
|
|
|
# Force immediate sync by restarting
|
|
ssh ubuntu-manticore "cd ~/docker/orbital-sync && docker compose down && docker compose up -d"
|
|
|
|
# Monitor sync in real-time
|
|
ssh ubuntu-manticore "docker logs orbital-sync -f"
|
|
|
|
# If all else fails, manually sync via Teleporter
|
|
# Primary: Settings → Teleporter → Backup
|
|
# Secondary: Settings → Teleporter → Restore (upload backup file)
|
|
```
|
|
|
|
### NPM DNS Sync Failing
|
|
**Symptoms**: NPM proxy hosts missing from Pi-hole custom.list, new domains not resolving
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check NPM sync script status
|
|
ssh npm-pihole "cat /var/log/cron.log | grep npm-pihole-sync"
|
|
|
|
# Run sync script manually to see errors
|
|
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh"
|
|
|
|
# Check script can access both Pi-holes
|
|
ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
|
|
ssh npm-pihole "ssh ubuntu-manticore 'docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com'"
|
|
|
|
# Verify SSH connectivity to ubuntu-manticore
|
|
ssh npm-pihole "ssh ubuntu-manticore 'echo SSH OK'"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Fix SSH key authentication (if needed)
|
|
ssh npm-pihole "ssh-copy-id ubuntu-manticore"
|
|
|
|
# Test script with dry-run
|
|
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh --dry-run"
|
|
|
|
# Run script manually to sync immediately
|
|
ssh npm-pihole "/home/cal/scripts/npm-pihole-sync.sh"
|
|
|
|
# Verify cron job is configured
|
|
ssh npm-pihole "crontab -l | grep npm-pihole-sync"
|
|
|
|
# If cron job missing, add it
|
|
ssh npm-pihole "crontab -e"
|
|
# Add: 0 * * * * /home/cal/scripts/npm-pihole-sync.sh >> /var/log/npm-pihole-sync.log 2>&1
|
|
|
|
# Check script logs
|
|
ssh npm-pihole "tail -50 /var/log/npm-pihole-sync.log"
|
|
```
|
|
|
|
### Secondary Pi-hole Performance Issues
|
|
**Symptoms**: ubuntu-manticore slow, high CPU/RAM usage, Pi-hole affecting Jellyfin/Tdarr
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check resource usage
|
|
ssh ubuntu-manticore "docker stats --no-stream"
|
|
|
|
# Pi-hole should use <1% CPU and ~150MB RAM
|
|
# If higher, investigate:
|
|
ssh ubuntu-manticore "docker logs pihole --tail 100"
|
|
|
|
# Check for excessive queries
|
|
ssh ubuntu-manticore "docker exec pihole pihole -c -e"
|
|
|
|
# Check for DNS loops or misconfiguration
|
|
ssh ubuntu-manticore "docker exec pihole pihole -t" # Tail pihole.log
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart Pi-hole if resource usage is high
|
|
ssh ubuntu-manticore "docker restart pihole"
|
|
|
|
# Check for DNS query loops
|
|
# Look for same domain being queried repeatedly
|
|
ssh ubuntu-manticore "docker exec pihole pihole -t | grep -A 5 'query\[A\]'"
|
|
|
|
# Adjust Pi-hole cache settings if needed
|
|
ssh ubuntu-manticore "docker exec pihole bash -c 'echo \"cache-size=10000\" >> /etc/dnsmasq.d/99-custom.conf'"
|
|
ssh ubuntu-manticore "docker restart pihole"
|
|
|
|
# If Jellyfin/Tdarr are affected, verify Pi-hole is using minimal resources
|
|
# Resource limits can be added to docker-compose.yml:
|
|
ssh ubuntu-manticore "nano ~/docker/pihole/docker-compose.yml"
|
|
# Add under pihole service:
|
|
# deploy:
|
|
# resources:
|
|
# limits:
|
|
# cpus: '0.5'
|
|
# memory: 256M
|
|
```
|
|
|
|
### iOS Devices Still Getting 403 Errors (Post-HA Deployment)
|
|
**Symptoms**: After deploying dual Pi-hole setup, iOS devices still bypass DNS and get 403 errors on internal services
|
|
**Diagnosis**:
|
|
```bash
|
|
# Verify UniFi DHCP has BOTH Pi-holes configured, NO public DNS
|
|
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
|
|
# DNS1: 10.10.0.16
|
|
# DNS2: 10.10.0.226
|
|
# Public DNS (1.1.1.1, 8.8.8.8): REMOVED
|
|
|
|
# Check iOS DNS settings
|
|
# iOS: Settings → WiFi → (i) → DNS
|
|
# Should show: 10.10.0.16
|
|
|
|
# Force iOS DHCP renewal
|
|
# iOS: Settings → WiFi → Forget Network → Reconnect
|
|
|
|
# Check NPM logs for request source
|
|
ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -50 /data/logs/proxy-host-*_access.log | grep 403"
|
|
|
|
# Verify both Pi-holes have custom DNS entries
|
|
ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
|
|
ssh ubuntu-manticore "docker exec pihole cat /etc/pihole/custom.list | grep git.manticorum.com"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Solution 1: Verify public DNS is removed from UniFi DHCP
|
|
# If public DNS (1.1.1.1) is still configured, iOS will prefer it
|
|
# Remove ALL public DNS servers from UniFi DHCP configuration
|
|
|
|
# Solution 2: Force iOS to renew DHCP lease
|
|
# iOS: Settings → WiFi → Forget Network
|
|
# Then reconnect to WiFi
|
|
# This forces device to get new DNS servers from DHCP
|
|
|
|
# Solution 3: Disable iOS encrypted DNS if still active
|
|
# iOS: Settings → [Your Name] → iCloud → Private Relay → OFF
|
|
# iOS: Check for DNS profiles: Settings → General → VPN & Device Management
|
|
|
|
# Solution 4: If encrypted DNS persists, add public IP to NPM ACL (fallback)
|
|
# See "iOS DNS Bypass Issues" section above for detailed steps
|
|
|
|
# Solution 5: Test with different iOS device to isolate issue
|
|
# If other iOS devices work, issue is device-specific configuration
|
|
|
|
# Verification after fix
|
|
ssh npm-pihole "docker exec nginx-proxy-manager_app_1 tail -f /data/logs/proxy-host-*_access.log"
|
|
# Access git.manticorum.com from iOS
|
|
# Should see: [Client 10.0.0.x] - - 200 (local IP)
|
|
```
|
|
|
|
### Both Pi-holes Failing Simultaneously
|
|
**Symptoms**: Complete DNS failure across network, all devices cannot resolve domains
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check both Pi-hole containers
|
|
ssh npm-pihole "docker ps -a | grep pihole"
|
|
ssh ubuntu-manticore "docker ps -a | grep pihole"
|
|
|
|
# Check both hosts are reachable
|
|
ping -c 4 10.10.0.16
|
|
ping -c 4 10.10.0.226
|
|
|
|
# Check Docker daemon on both hosts
|
|
ssh npm-pihole "systemctl status docker"
|
|
ssh ubuntu-manticore "systemctl status docker"
|
|
|
|
# Test emergency DNS (bypassing Pi-hole)
|
|
dig @8.8.8.8 google.com
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Emergency: Temporarily use public DNS
|
|
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
|
|
# DNS1: 8.8.8.8 (Google DNS - temporary)
|
|
# DNS2: 1.1.1.1 (Cloudflare - temporary)
|
|
|
|
# Restart both Pi-holes
|
|
ssh npm-pihole "docker restart pihole"
|
|
ssh ubuntu-manticore "docker restart pihole"
|
|
|
|
# If Docker daemon issues:
|
|
ssh npm-pihole "sudo systemctl restart docker"
|
|
ssh ubuntu-manticore "sudo systemctl restart docker"
|
|
|
|
# Rebuild both Pi-holes if corruption suspected
|
|
ssh npm-pihole "cd ~/pihole && docker compose down && docker compose up -d"
|
|
ssh ubuntu-manticore "cd ~/docker/pihole && docker compose down && docker compose up -d"
|
|
|
|
# After Pi-holes are restored, revert UniFi DHCP to Pi-holes
|
|
# UniFi UI: Settings → Networks → LAN → DHCP → Name Server
|
|
# DNS1: 10.10.0.16
|
|
# DNS2: 10.10.0.226
|
|
```
|
|
|
|
### Query Load Not Balanced Between Pi-holes
|
|
**Symptoms**: Primary Pi-hole getting most queries, secondary rarely used
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check query counts on both Pi-holes
|
|
# Primary: http://10.10.0.16/admin → Dashboard → Total Queries
|
|
# Secondary: http://10.10.0.226:8053/admin → Dashboard → Total Queries
|
|
|
|
# This is NORMAL behavior - clients prefer DNS1 by default
|
|
# Secondary is for failover, not load balancing
|
|
|
|
# To verify failover works:
|
|
ssh npm-pihole "docker stop pihole"
|
|
# Wait 30 seconds
|
|
# Check secondary query count - should increase
|
|
ssh npm-pihole "docker start pihole"
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# No action needed - this is expected behavior
|
|
# DNS failover is for redundancy, not load distribution
|
|
|
|
# If you want true load balancing (advanced):
|
|
# Option 1: Configure some devices to prefer DNS2
|
|
# Manually set DNS on specific devices to 10.10.0.226, 10.10.0.16
|
|
|
|
# Option 2: Implement DNS round-robin (requires custom DHCP)
|
|
# Not recommended for homelab - adds complexity
|
|
|
|
# Option 3: Accept default behavior (recommended)
|
|
# Primary handles most traffic, secondary provides failover
|
|
# This is industry standard DNS HA behavior
|
|
```
|
|
|
|
## Pi-hole Blocklist Blocking Legitimate Apps
|
|
|
|
### Facebook Blocklist Breaking Messenger Kids (2026-03-05)
|
|
**Symptoms**: iPad could not connect to Facebook Messenger Kids. App would not load or send/receive messages. Disconnecting iPad from WiFi (using cellular) restored functionality.
|
|
|
|
**Root Cause**: The `anudeepND/blacklist/master/facebook.txt` blocklist was subscribed in Pi-hole, which blocked all core Facebook domains needed by Messenger Kids.
|
|
|
|
**Blocked Domains (from pihole.log)**:
|
|
| Domain | Purpose |
|
|
|--------|---------|
|
|
| `edge-mqtt.facebook.com` | MQTT real-time message transport |
|
|
| `graph.facebook.com` | Facebook Graph API (login, contacts, profiles) |
|
|
| `graph-fallback.facebook.com` | Graph API fallback (blocked via CNAME chain) |
|
|
| `www.facebook.com` | Core Facebook domain |
|
|
|
|
**Allowed Domains** (not on the blocklist, resolved fine):
|
|
- `dgw.c10r.facebook.com` - Data gateway
|
|
- `mqtt.fallback.c10r.facebook.com` - MQTT fallback
|
|
- `chat-e2ee.c10r.facebook.com` - E2E encrypted chat
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Find blocked domains for a specific client IP
|
|
ssh pihole "docker exec pihole grep 'CLIENT_IP' /var/log/pihole/pihole.log | grep 'gravity blocked'"
|
|
|
|
# Check which blocklist contains a domain
|
|
ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com"
|
|
# Output: https://raw.githubusercontent.com/anudeepND/blacklist/master/facebook.txt (block)
|
|
```
|
|
|
|
**Resolution**: Removed the Facebook blocklist from primary Pi-hole (secondary didn't have it). The blocklist contained ~3,997 Facebook domains.
|
|
|
|
**Pi-hole v6 API - Deleting a Blocklist**:
|
|
```bash
|
|
# Authenticate and get session ID
|
|
SID=$(curl -s -X POST 'http://PIHOLE_IP:PORT/api/auth' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"password":"APP_PASSWORD"}' \
|
|
| python3 -c 'import sys,json; print(json.load(sys.stdin)["session"]["sid"])')
|
|
|
|
# DELETE uses the URL-encoded list ADDRESS as path parameter (NOT numeric ID)
|
|
# The ?type=block parameter is REQUIRED
|
|
curl -s -X DELETE \
|
|
"http://PIHOLE_IP:PORT/api/lists/URL_ENCODED_LIST_ADDRESS?type=block" \
|
|
-H "X-FTL-SID: $SID"
|
|
# Success returns HTTP 204 No Content
|
|
|
|
# Update gravity after removal
|
|
ssh pihole "docker exec pihole pihole -g"
|
|
|
|
# Verify domain is no longer blocked
|
|
ssh pihole "docker exec pihole pihole -q edge-mqtt.facebook.com"
|
|
```
|
|
|
|
**Important Pi-hole v6 API Notes**:
|
|
- List endpoints use the URL-encoded blocklist address as path param, not numeric IDs
|
|
- `?type=block` query parameter is mandatory for DELETE operations
|
|
- Numeric ID DELETE returns 200 with `{"took": ...}` but DOES NOT actually delete (silent failure)
|
|
- Successful address-based DELETE returns HTTP 204 (no body)
|
|
- Must run `pihole -g` (gravity update) after deletion for changes to take effect
|
|
|
|
**Future Improvement (TODO)**: Implement Pi-hole v6 group/client-based approach:
|
|
- Create a group for the iPad that bypasses the Facebook blocklist
|
|
- Re-add the Facebook blocklist assigned to the default group only
|
|
- Assign the iPad's IP to a "Kids Devices" client group that excludes the Facebook list
|
|
- This would maintain Facebook blocking for other devices while allowing Messenger Kids
|
|
- See: Pi-hole v6 Admin -> Groups/Clients for per-device blocklist management
|
|
|
|
## Service Discovery and DNS Issues
|
|
|
|
### Local DNS Problems
|
|
**Symptoms**: Services unreachable by hostname, DNS timeouts
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test local DNS resolution
|
|
nslookup service.homelab.local
|
|
dig @10.10.0.16 service.homelab.local
|
|
|
|
# Check DNS server status
|
|
systemctl status bind9 # or named
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Add to /etc/hosts as temporary fix
|
|
echo "10.10.0.100 service.homelab.local" | sudo tee -a /etc/hosts
|
|
|
|
# Restart DNS services
|
|
sudo systemctl restart bind9
|
|
sudo systemctl restart systemd-resolved
|
|
```
|
|
|
|
### Container Networking Issues
|
|
**Symptoms**: Containers cannot communicate, service discovery fails
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Docker networks
|
|
docker network ls
|
|
docker network inspect bridge
|
|
|
|
# Test container connectivity
|
|
docker exec container1 ping container2
|
|
docker exec container1 nslookup container2
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Create custom network
|
|
docker network create --driver bridge app-network
|
|
docker run --network app-network container
|
|
|
|
# Fix DNS in containers
|
|
docker run --dns 8.8.8.8 container
|
|
```
|
|
|
|
## Performance Issues
|
|
|
|
### Network Latency Problems
|
|
**Symptoms**: Slow response times, timeouts, poor performance
|
|
**Diagnosis**:
|
|
```bash
|
|
# Measure network latency
|
|
ping -c 100 host
|
|
mtr --report host
|
|
|
|
# Check network interface stats
|
|
ip -s link show
|
|
cat /proc/net/dev
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Optimize network settings
|
|
echo 'net.core.rmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
|
|
echo 'net.core.wmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
|
|
sudo sysctl -p
|
|
|
|
# Check for network congestion
|
|
iftop
|
|
nethogs
|
|
```
|
|
|
|
### Bandwidth Issues
|
|
**Symptoms**: Slow transfers, network congestion, dropped packets
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test bandwidth
|
|
iperf3 -s # Server
|
|
iperf3 -c server-ip # Client
|
|
|
|
# Check interface utilization
|
|
vnstat -i eth0
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Implement QoS if needed
|
|
sudo tc qdisc add dev eth0 root fq_codel
|
|
|
|
# Optimize buffer sizes
|
|
sudo ethtool -G eth0 rx 4096 tx 4096
|
|
```
|
|
|
|
## Emergency Recovery Procedures
|
|
|
|
### Network Emergency Recovery
|
|
**Complete network failure recovery**:
|
|
```bash
|
|
# Reset all network configuration
|
|
sudo systemctl stop networking
|
|
sudo ip addr flush eth0
|
|
sudo ip route flush table main
|
|
sudo systemctl start networking
|
|
|
|
# Manual network configuration
|
|
sudo ip addr add 10.10.0.100/24 dev eth0
|
|
sudo ip route add default via 10.10.0.1
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
```
|
|
|
|
### SSH Emergency Access
|
|
**When locked out of systems**:
|
|
```bash
|
|
# Use emergency SSH key
|
|
ssh -i ~/.ssh/emergency_homelab_rsa user@host
|
|
|
|
# Via console access (if available)
|
|
# Use hypervisor console or physical access
|
|
|
|
# Reset SSH to allow password auth temporarily
|
|
sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
|
|
sudo systemctl restart sshd
|
|
```
|
|
|
|
### Service Recovery
|
|
**Critical service restoration**:
|
|
```bash
|
|
# Restart all network services
|
|
sudo systemctl restart networking
|
|
sudo systemctl restart nginx
|
|
sudo systemctl restart sshd
|
|
|
|
# Emergency firewall disable
|
|
sudo ufw disable # CAUTION: Only for troubleshooting
|
|
|
|
# Service-specific recovery
|
|
sudo systemctl restart docker
|
|
sudo systemctl restart systemd-resolved
|
|
```
|
|
|
|
## Monitoring and Prevention
|
|
|
|
### Network Health Monitoring
|
|
```bash
|
|
#!/bin/bash
|
|
# network-monitor.sh
|
|
CRITICAL_HOSTS="10.10.0.1 10.10.0.16 nas.homelab.local"
|
|
CRITICAL_SERVICES="https://homelab.local http://proxmox.homelab.local:8006"
|
|
|
|
for host in $CRITICAL_HOSTS; do
|
|
if ! ping -c1 -W5 $host >/dev/null 2>&1; then
|
|
echo "ALERT: $host unreachable" | logger -t network-monitor
|
|
fi
|
|
done
|
|
|
|
for service in $CRITICAL_SERVICES; do
|
|
if ! curl -sSf --max-time 10 "$service" >/dev/null 2>&1; then
|
|
echo "ALERT: $service unavailable" | logger -t network-monitor
|
|
fi
|
|
done
|
|
```
|
|
|
|
### Automated Recovery Scripts
|
|
```bash
|
|
#!/bin/bash
|
|
# network-recovery.sh
|
|
if ! ping -c1 8.8.8.8 >/dev/null 2>&1; then
|
|
echo "Network down, attempting recovery..."
|
|
sudo systemctl restart networking
|
|
sleep 10
|
|
if ping -c1 8.8.8.8 >/dev/null 2>&1; then
|
|
echo "Network recovered"
|
|
else
|
|
echo "Manual intervention required"
|
|
fi
|
|
fi
|
|
```
|
|
|
|
## Quick Reference Commands
|
|
|
|
### Network Diagnostics
|
|
```bash
|
|
# Connectivity tests
|
|
ping host
|
|
traceroute host
|
|
mtr host
|
|
nc -zv host port
|
|
|
|
# Service checks
|
|
systemctl status networking
|
|
systemctl status nginx
|
|
systemctl status sshd
|
|
|
|
# Network configuration
|
|
ip addr show
|
|
ip route show
|
|
ss -tuln
|
|
```
|
|
|
|
### Emergency Commands
|
|
```bash
|
|
# Network restart
|
|
sudo systemctl restart networking
|
|
|
|
# SSH emergency access
|
|
ssh -i ~/.ssh/emergency_homelab_rsa user@host
|
|
|
|
# Firewall quick disable (emergency only)
|
|
sudo ufw disable
|
|
|
|
# DNS quick fix
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
```
|
|
|
|
This troubleshooting guide provides comprehensive solutions for common networking issues in home lab environments. |