Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
496 lines
10 KiB
Markdown
496 lines
10 KiB
Markdown
# Networking Infrastructure Troubleshooting Guide
|
|
|
|
## SSH Connection Issues
|
|
|
|
### SSH Authentication Failures
|
|
**Symptoms**: Permission denied, connection refused, timeout
|
|
**Diagnosis**:
|
|
```bash
|
|
# Verbose SSH debugging
|
|
ssh -vvv user@host
|
|
|
|
# Test different authentication methods
|
|
ssh -o PasswordAuthentication=no user@host
|
|
ssh -o PubkeyAuthentication=yes user@host
|
|
|
|
# Check local key files
|
|
ls -la ~/.ssh/
|
|
ssh-keygen -lf ~/.ssh/homelab_rsa.pub
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Re-deploy SSH keys
|
|
ssh-copy-id -i ~/.ssh/homelab_rsa.pub user@host
|
|
ssh-copy-id -i ~/.ssh/emergency_homelab_rsa.pub user@host
|
|
|
|
# Fix key permissions
|
|
chmod 600 ~/.ssh/homelab_rsa
|
|
chmod 644 ~/.ssh/homelab_rsa.pub
|
|
chmod 700 ~/.ssh
|
|
|
|
# Verify remote authorized_keys
|
|
ssh user@host 'chmod 700 ~/.ssh && chmod 600 ~/.ssh/authorized_keys'
|
|
```
|
|
|
|
### SSH Service Issues
|
|
**Symptoms**: Connection refused, service not running
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check SSH service status
|
|
systemctl status sshd
|
|
ss -tlnp | grep :22
|
|
|
|
# Test port connectivity
|
|
nc -zv host 22
|
|
nmap -p 22 host
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart SSH service
|
|
sudo systemctl restart sshd
|
|
sudo systemctl enable sshd
|
|
|
|
# Check firewall
|
|
sudo ufw status
|
|
sudo ufw allow ssh
|
|
|
|
# Verify SSH configuration
|
|
sudo sshd -T | grep -E "(passwordauth|pubkeyauth|permitroot)"
|
|
```
|
|
|
|
## Network Connectivity Problems
|
|
|
|
### Basic Network Troubleshooting
|
|
**Symptoms**: Cannot reach hosts, timeouts, routing issues
|
|
**Diagnosis**:
|
|
```bash
|
|
# Basic connectivity tests
|
|
ping host
|
|
traceroute host
|
|
mtr host
|
|
|
|
# Check local network configuration
|
|
ip addr show
|
|
ip route show
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart networking
|
|
sudo systemctl restart networking
|
|
sudo netplan apply # Ubuntu
|
|
|
|
# Reset network interface
|
|
sudo ip link set eth0 down
|
|
sudo ip link set eth0 up
|
|
|
|
# Check default gateway
|
|
sudo ip route add default via 10.10.0.1
|
|
```
|
|
|
|
### DNS Resolution Issues
|
|
**Symptoms**: Cannot resolve hostnames, slow resolution
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test DNS resolution
|
|
nslookup google.com
|
|
dig google.com
|
|
host google.com
|
|
|
|
# Check DNS servers
|
|
systemd-resolve --status
|
|
cat /etc/resolv.conf
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Temporary DNS fix
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
|
|
# Restart DNS services
|
|
sudo systemctl restart systemd-resolved
|
|
|
|
# Flush DNS cache
|
|
sudo systemd-resolve --flush-caches
|
|
```
|
|
|
|
## Reverse Proxy and Load Balancer Issues
|
|
|
|
### Nginx Configuration Problems
|
|
**Symptoms**: 502 Bad Gateway, 503 Service Unavailable, SSL errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Nginx status and logs
|
|
systemctl status nginx
|
|
sudo tail -f /var/log/nginx/error.log
|
|
sudo tail -f /var/log/nginx/access.log
|
|
|
|
# Test Nginx configuration
|
|
sudo nginx -t
|
|
sudo nginx -T # Show full configuration
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Reload Nginx configuration
|
|
sudo nginx -s reload
|
|
|
|
# Check upstream servers
|
|
curl -I http://backend-server:port
|
|
telnet backend-server port
|
|
|
|
# Fix common configuration issues
|
|
sudo nano /etc/nginx/sites-available/default
|
|
# Check proxy_pass URLs, upstream definitions
|
|
```
|
|
|
|
### SSL/TLS Certificate Issues
|
|
**Symptoms**: Certificate warnings, expired certificates, connection errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check certificate validity
|
|
openssl s_client -connect host:443 -servername host
|
|
openssl x509 -in /etc/ssl/certs/cert.pem -text -noout
|
|
|
|
# Check certificate expiry
|
|
openssl x509 -in /etc/ssl/certs/cert.pem -noout -dates
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Renew Let's Encrypt certificates
|
|
sudo certbot renew --dry-run
|
|
sudo certbot renew --force-renewal
|
|
|
|
# Generate self-signed certificate
|
|
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
|
|
-keyout /etc/ssl/private/selfsigned.key \
|
|
-out /etc/ssl/certs/selfsigned.crt
|
|
```
|
|
|
|
## Network Storage Issues
|
|
|
|
### CIFS/SMB Mount Problems
|
|
**Symptoms**: Mount failures, connection timeouts, permission errors
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test SMB connectivity
|
|
smbclient -L //nas-server -U username
|
|
testparm # Test Samba configuration
|
|
|
|
# Check mount status
|
|
mount | grep cifs
|
|
df -h | grep cifs
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Remount with verbose logging
|
|
sudo mount -t cifs //server/share /mnt/point -o username=user,password=pass,vers=3.0
|
|
|
|
# Fix mount options in /etc/fstab
|
|
//server/share /mnt/point cifs credentials=/etc/cifs/credentials,uid=1000,gid=1000,iocharset=utf8,file_mode=0644,dir_mode=0755,cache=strict,_netdev 0 0
|
|
|
|
# Test credentials
|
|
sudo cat /etc/cifs/credentials
|
|
# Should contain: username=, password=, domain=
|
|
```
|
|
|
|
### NFS Mount Issues
|
|
**Symptoms**: Stale file handles, mount hangs, permission denied
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check NFS services
|
|
systemctl status nfs-client.target
|
|
showmount -e nfs-server
|
|
|
|
# Test NFS connectivity
|
|
rpcinfo -p nfs-server
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Restart NFS services
|
|
sudo systemctl restart nfs-client.target
|
|
|
|
# Remount NFS shares
|
|
sudo umount /mnt/nfs-share
|
|
sudo mount -t nfs server:/path /mnt/nfs-share
|
|
|
|
# Fix stale file handles
|
|
sudo umount -f /mnt/nfs-share
|
|
sudo mount /mnt/nfs-share
|
|
```
|
|
|
|
## Firewall and Security Issues
|
|
|
|
### Port Access Problems
|
|
**Symptoms**: Connection refused, filtered ports, blocked services
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check firewall status
|
|
sudo ufw status verbose
|
|
sudo iptables -L -n -v
|
|
|
|
# Test port accessibility
|
|
nc -zv host port
|
|
nmap -p port host
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Open required ports
|
|
sudo ufw allow ssh
|
|
sudo ufw allow 80/tcp
|
|
sudo ufw allow 443/tcp
|
|
sudo ufw allow from 10.10.0.0/24
|
|
|
|
# Reset firewall if needed
|
|
sudo ufw --force reset
|
|
sudo ufw enable
|
|
```
|
|
|
|
### Network Security Issues
|
|
**Symptoms**: Unauthorized access, suspicious traffic, security alerts
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check active connections
|
|
ss -tuln
|
|
netstat -tuln
|
|
|
|
# Review logs for security events
|
|
sudo tail -f /var/log/auth.log
|
|
sudo tail -f /var/log/syslog | grep -i security
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Block suspicious IPs
|
|
sudo ufw deny from suspicious-ip
|
|
|
|
# Update SSH security
|
|
sudo nano /etc/ssh/sshd_config
|
|
# Set: PasswordAuthentication no, PermitRootLogin no
|
|
sudo systemctl restart sshd
|
|
```
|
|
|
|
## Service Discovery and DNS Issues
|
|
|
|
### Local DNS Problems
|
|
**Symptoms**: Services unreachable by hostname, DNS timeouts
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test local DNS resolution
|
|
nslookup service.homelab.local
|
|
dig @10.10.0.16 service.homelab.local
|
|
|
|
# Check DNS server status
|
|
systemctl status bind9 # or named
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Add to /etc/hosts as temporary fix
|
|
echo "10.10.0.100 service.homelab.local" | sudo tee -a /etc/hosts
|
|
|
|
# Restart DNS services
|
|
sudo systemctl restart bind9
|
|
sudo systemctl restart systemd-resolved
|
|
```
|
|
|
|
### Container Networking Issues
|
|
**Symptoms**: Containers cannot communicate, service discovery fails
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check Docker networks
|
|
docker network ls
|
|
docker network inspect bridge
|
|
|
|
# Test container connectivity
|
|
docker exec container1 ping container2
|
|
docker exec container1 nslookup container2
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Create custom network
|
|
docker network create --driver bridge app-network
|
|
docker run --network app-network container
|
|
|
|
# Fix DNS in containers
|
|
docker run --dns 8.8.8.8 container
|
|
```
|
|
|
|
## Performance Issues
|
|
|
|
### Network Latency Problems
|
|
**Symptoms**: Slow response times, timeouts, poor performance
|
|
**Diagnosis**:
|
|
```bash
|
|
# Measure network latency
|
|
ping -c 100 host
|
|
mtr --report host
|
|
|
|
# Check network interface stats
|
|
ip -s link show
|
|
cat /proc/net/dev
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Optimize network settings
|
|
echo 'net.core.rmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
|
|
echo 'net.core.wmem_max = 134217728' | sudo tee -a /etc/sysctl.conf
|
|
sudo sysctl -p
|
|
|
|
# Check for network congestion
|
|
iftop
|
|
nethogs
|
|
```
|
|
|
|
### Bandwidth Issues
|
|
**Symptoms**: Slow transfers, network congestion, dropped packets
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test bandwidth
|
|
iperf3 -s # Server
|
|
iperf3 -c server-ip # Client
|
|
|
|
# Check interface utilization
|
|
vnstat -i eth0
|
|
```
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Implement QoS if needed
|
|
sudo tc qdisc add dev eth0 root fq_codel
|
|
|
|
# Optimize buffer sizes
|
|
sudo ethtool -G eth0 rx 4096 tx 4096
|
|
```
|
|
|
|
## Emergency Recovery Procedures
|
|
|
|
### Network Emergency Recovery
|
|
**Complete network failure recovery**:
|
|
```bash
|
|
# Reset all network configuration
|
|
sudo systemctl stop networking
|
|
sudo ip addr flush eth0
|
|
sudo ip route flush table main
|
|
sudo systemctl start networking
|
|
|
|
# Manual network configuration
|
|
sudo ip addr add 10.10.0.100/24 dev eth0
|
|
sudo ip route add default via 10.10.0.1
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
```
|
|
|
|
### SSH Emergency Access
|
|
**When locked out of systems**:
|
|
```bash
|
|
# Use emergency SSH key
|
|
ssh -i ~/.ssh/emergency_homelab_rsa user@host
|
|
|
|
# Via console access (if available)
|
|
# Use hypervisor console or physical access
|
|
|
|
# Reset SSH to allow password auth temporarily
|
|
sudo sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
|
|
sudo systemctl restart sshd
|
|
```
|
|
|
|
### Service Recovery
|
|
**Critical service restoration**:
|
|
```bash
|
|
# Restart all network services
|
|
sudo systemctl restart networking
|
|
sudo systemctl restart nginx
|
|
sudo systemctl restart sshd
|
|
|
|
# Emergency firewall disable
|
|
sudo ufw disable # CAUTION: Only for troubleshooting
|
|
|
|
# Service-specific recovery
|
|
sudo systemctl restart docker
|
|
sudo systemctl restart systemd-resolved
|
|
```
|
|
|
|
## Monitoring and Prevention
|
|
|
|
### Network Health Monitoring
|
|
```bash
|
|
#!/bin/bash
|
|
# network-monitor.sh
|
|
CRITICAL_HOSTS="10.10.0.1 10.10.0.16 nas.homelab.local"
|
|
CRITICAL_SERVICES="https://homelab.local http://proxmox.homelab.local:8006"
|
|
|
|
for host in $CRITICAL_HOSTS; do
|
|
if ! ping -c1 -W5 $host >/dev/null 2>&1; then
|
|
echo "ALERT: $host unreachable" | logger -t network-monitor
|
|
fi
|
|
done
|
|
|
|
for service in $CRITICAL_SERVICES; do
|
|
if ! curl -sSf --max-time 10 "$service" >/dev/null 2>&1; then
|
|
echo "ALERT: $service unavailable" | logger -t network-monitor
|
|
fi
|
|
done
|
|
```
|
|
|
|
### Automated Recovery Scripts
|
|
```bash
|
|
#!/bin/bash
|
|
# network-recovery.sh
|
|
if ! ping -c1 8.8.8.8 >/dev/null 2>&1; then
|
|
echo "Network down, attempting recovery..."
|
|
sudo systemctl restart networking
|
|
sleep 10
|
|
if ping -c1 8.8.8.8 >/dev/null 2>&1; then
|
|
echo "Network recovered"
|
|
else
|
|
echo "Manual intervention required"
|
|
fi
|
|
fi
|
|
```
|
|
|
|
## Quick Reference Commands
|
|
|
|
### Network Diagnostics
|
|
```bash
|
|
# Connectivity tests
|
|
ping host
|
|
traceroute host
|
|
mtr host
|
|
nc -zv host port
|
|
|
|
# Service checks
|
|
systemctl status networking
|
|
systemctl status nginx
|
|
systemctl status sshd
|
|
|
|
# Network configuration
|
|
ip addr show
|
|
ip route show
|
|
ss -tuln
|
|
```
|
|
|
|
### Emergency Commands
|
|
```bash
|
|
# Network restart
|
|
sudo systemctl restart networking
|
|
|
|
# SSH emergency access
|
|
ssh -i ~/.ssh/emergency_homelab_rsa user@host
|
|
|
|
# Firewall quick disable (emergency only)
|
|
sudo ufw disable
|
|
|
|
# DNS quick fix
|
|
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
|
|
```
|
|
|
|
This troubleshooting guide provides comprehensive solutions for common networking issues in home lab environments. |