# Networking Infrastructure Troubleshooting Guide ## SSH Connection Issues ### SSH Authentication Failures **Symptoms**: Permission denied, connection refused, timeout **Diagnosis**: ```bash # Verbose SSH debugging ssh -vvv user@host # Test different authentication methods ssh -o PasswordAuthentication=no user@host ssh -o PubkeyAuthentication=yes user@host # Check local key files ls -la ~/.ssh/ ssh-keygen -lf ~/.ssh/homelab_rsa.pub ``` **Solutions**: ```bash # Re-deploy SSH keys ssh-copy-id -i ~/.ssh/homelab_rsa.pub user@host ssh-copy-id -i ~/.ssh/emergency_homelab_rsa.pub user@host # Fix key permissions chmod 600 ~/.ssh/homelab_rsa chmod 644 ~/.ssh/homelab_rsa.pub chmod 700 ~/.ssh # Verify remote authorized_keys ssh user@host 'chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys' ``` ### SSH Service Issues **Symptoms**: Connection refused, service not running **Diagnosis**: ```bash # Check SSH service status systemctl status sshd ss -tlnp | grep :22 # Test port connectivity nc -zv host 22 nmap -p 22 host ``` **Solutions**: ```bash # Restart SSH service sudo systemctl restart sshd sudo systemctl enable sshd # Check firewall sudo ufw status sudo ufw allow ssh # Verify SSH configuration sudo sshd -T | grep -E "(passwordauth|pubkeyauth|permitroot)" ``` ## Network Connectivity Problems ### Basic Network Troubleshooting **Symptoms**: Cannot reach hosts, timeouts, routing issues **Diagnosis**: ```bash # Basic connectivity tests ping host traceroute host mtr host # Check local network configuration ip addr show ip route show cat /etc/resolv.conf ``` **Solutions**: ```bash # Restart networking sudo systemctl restart networking sudo netplan apply # Ubuntu # Reset network interface sudo ip link set eth0 down sudo ip link set eth0 up # Check default gateway sudo ip route add default via 10.10.0.1 ``` ### DNS Resolution Issues **Symptoms**: Cannot resolve hostnames, slow resolution **Diagnosis**: ```bash # Test DNS resolution nslookup google.com dig google.com host google.com # Check DNS servers systemd-resolve --status cat /etc/resolv.conf ``` **Solutions**: ```bash # Temporary DNS fix echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf # Restart DNS services sudo systemctl restart systemd-resolved # Flush DNS cache sudo systemd-resolve --flush-caches ``` ## Reverse Proxy and Load Balancer Issues ### Nginx Configuration Problems **Symptoms**: 502 Bad Gateway, 503 Service Unavailable, SSL errors **Diagnosis**: ```bash # Check Nginx status and logs systemctl status nginx sudo tail -f /var/log/nginx/error.log sudo tail -f /var/log/nginx/access.log # Test Nginx configuration sudo nginx -t sudo nginx -T # Show full configuration ``` **Solutions**: ```bash # Reload Nginx configuration sudo nginx -s reload # Check upstream servers curl -I http://backend-server:port telnet backend-server port # Fix common configuration issues sudo nano /etc/nginx/sites-available/default # Check proxy_pass URLs, upstream definitions ``` ### SSL/TLS Certificate Issues **Symptoms**: Certificate warnings, expired certificates, connection errors **Diagnosis**: ```bash # Check certificate validity openssl s_client -connect host:443 -servername host openssl x509 -in /etc/ssl/certs/cert.pem -text -noout # Check certificate expiry openssl x509 -in /etc/ssl/certs/cert.pem -noout -dates ``` **Solutions**: ```bash # Renew Let's Encrypt certificates sudo certbot renew --dry-run sudo certbot renew --force-renewal # Generate self-signed certificate sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout /etc/ssl/private/selfsigned.key \ -out /etc/ssl/certs/selfsigned.crt ``` ## Network Storage Issues ### CIFS/SMB Mount Problems **Symptoms**: Mount failures, connection timeouts, permission errors **Diagnosis**: ```bash # Test SMB connectivity smbclient -L //nas-server -U username testparm # Test Samba configuration # Check mount status mount | grep cifs df -h | grep cifs ``` **Solutions**: ```bash # Remount with verbose logging sudo mount -t cifs //server/share /mnt/point -o username=user,password=pass,vers=3.0 # Fix mount options in /etc/fstab //server/share /mnt/point cifs credentials=/etc/cifs/credentials,uid=1000,gid=1000,iocharset=utf8,file_mode=0644,dir_mode=0755,cache=strict,_netdev 0 0 # Test credentials sudo cat /etc/cifs/credentials # Should contain: username=, password=, domain= ``` ### NFS Mount Issues **Symptoms**: Stale file handles, mount hangs, permission denied **Diagnosis**: ```bash # Check NFS services systemctl status nfs-client.target showmount -e nfs-server # Test NFS connectivity rpcinfo -p nfs-server ``` **Solutions**: ```bash # Restart NFS services sudo systemctl restart nfs-client.target # Remount NFS shares sudo umount /mnt/nfs-share sudo mount -t nfs server:/path /mnt/nfs-share # Fix stale file handles sudo umount -f /mnt/nfs-share sudo mount /mnt/nfs-share ``` ## Firewall and Security Issues ### Port Access Problems **Symptoms**: Connection refused, filtered ports, blocked services **Diagnosis**: ```bash # Check firewall status sudo ufw status verbose sudo iptables -L -n -v # Test port accessibility nc -zv host port nmap -p port host ``` **Solutions**: ```bash # Open required ports sudo ufw allow ssh sudo ufw allow 80/tcp sudo ufw allow 443/tcp sudo ufw allow from 10.10.0.0/24 # Reset firewall if needed sudo ufw --force reset sudo ufw enable ``` ### Network Security Issues **Symptoms**: Unauthorized access, suspicious traffic, security alerts **Diagnosis**: ```bash # Check active connections ss -tuln netstat -tuln # Review logs for security events sudo tail -f /var/log/auth.log sudo tail -f /var/log/syslog | grep -i security ``` **Solutions**: ```bash # Block suspicious IPs sudo ufw deny from suspicious-ip # Update SSH security sudo nano /etc/ssh/sshd_config # Set: PasswordAuthentication no, PermitRootLogin no sudo systemctl restart sshd ``` ## Service Discovery and DNS Issues ### Local DNS Problems **Symptoms**: Services unreachable by hostname, DNS timeouts **Diagnosis**: ```bash # Test local DNS resolution nslookup service.homelab.local dig @10.10.0.16 service.homelab.local # Check DNS server status systemctl status bind9 # or named ``` **Solutions**: ```bash # Add to /etc/hosts as temporary fix echo "10.10.0.100 service.homelab.local" | sudo tee -a /etc/hosts # Restart DNS services sudo systemctl restart bind9 sudo systemctl restart systemd-resolved ``` ### Container Networking Issues **Symptoms**: Containers cannot communicate, service discovery fails **Diagnosis**: ```bash # Check Docker networks docker network ls docker network inspect bridge # Test container connectivity docker exec container1 ping container2 docker exec container1 nslookup container2 ``` **Solutions**: ```bash # Create custom network docker network create --driver bridge app-network docker run --network app-network container # Fix DNS in containers docker run --dns 8.8.8.8 container ``` ## Performance Issues ### Network Latency Problems **Symptoms**: Slow response times, timeouts, poor performance **Diagnosis**: ```bash # Measure network latency ping -c 100 host mtr --report host # Check network interface stats ip -s link show cat /proc/net/dev ``` **Solutions**: ```bash # Optimize network settings echo 'net.core.rmem_max = 134217728' | sudo tee -a /etc/sysctl.conf echo 'net.core.wmem_max = 134217728' | sudo tee -a /etc/sysctl.conf sudo sysctl -p # Check for network congestion iftop nethogs ``` ### Bandwidth Issues **Symptoms**: Slow transfers, network congestion, dropped packets **Diagnosis**: ```bash # Test bandwidth iperf3 -s # Server iperf3 -c server-ip # Client # Check interface utilization vnstat -i eth0 ``` **Solutions**: ```bash # Implement QoS if needed sudo tc qdisc add dev eth0 root fq_codel # Optimize buffer sizes sudo ethtool -G eth0 rx 4096 tx 4096 ``` ## Emergency Recovery Procedures ### Network Emergency Recovery **Complete network failure recovery**: ```bash # Reset all network configuration sudo systemctl stop networking sudo ip addr flush eth0 sudo ip route flush table main sudo systemctl start networking # Manual network configuration sudo ip addr add 10.10.0.100/24 dev eth0 sudo ip route add default via 10.10.0.1 echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf ``` ### SSH Emergency Access **When locked out of systems**: ```bash # Use emergency SSH key ssh -i ~/.ssh/emergency_homelab_rsa user@host # Via console access (if available) # Use hypervisor console or physical access # Reset SSH to allow password auth temporarily sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config sudo systemctl restart sshd ``` ### Service Recovery **Critical service restoration**: ```bash # Restart all network services sudo systemctl restart networking sudo systemctl restart nginx sudo systemctl restart sshd # Emergency firewall disable sudo ufw disable # CAUTION: Only for troubleshooting # Service-specific recovery sudo systemctl restart docker sudo systemctl restart systemd-resolved ``` ## Monitoring and Prevention ### Network Health Monitoring ```bash #!/bin/bash # network-monitor.sh CRITICAL_HOSTS="10.10.0.1 10.10.0.16 nas.homelab.local" CRITICAL_SERVICES="https://homelab.local http://proxmox.homelab.local:8006" for host in $CRITICAL_HOSTS; do if ! ping -c1 -W5 $host >/dev/null 2>&1; then echo "ALERT: $host unreachable" | logger -t network-monitor fi done for service in $CRITICAL_SERVICES; do if ! curl -sSf --max-time 10 "$service" >/dev/null 2>&1; then echo "ALERT: $service unavailable" | logger -t network-monitor fi done ``` ### Automated Recovery Scripts ```bash #!/bin/bash # network-recovery.sh if ! ping -c1 8.8.8.8 >/dev/null 2>&1; then echo "Network down, attempting recovery..." sudo systemctl restart networking sleep 10 if ping -c1 8.8.8.8 >/dev/null 2>&1; then echo "Network recovered" else echo "Manual intervention required" fi fi ``` ## Quick Reference Commands ### Network Diagnostics ```bash # Connectivity tests ping host traceroute host mtr host nc -zv host port # Service checks systemctl status networking systemctl status nginx systemctl status sshd # Network configuration ip addr show ip route show ss -tuln ``` ### Emergency Commands ```bash # Network restart sudo systemctl restart networking # SSH emergency access ssh -i ~/.ssh/emergency_homelab_rsa user@host # Firewall quick disable (emergency only) sudo ufw disable # DNS quick fix echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf ``` This troubleshooting guide provides comprehensive solutions for common networking issues in home lab environments.