claude-home/networking/examples/ssh-troubleshooting.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

258 lines
5.8 KiB
Markdown

# SSH Troubleshooting Reference
## Common Configuration Issues
### UseKeychain Compatibility Error
**Error:** `Bad configuration option: usekeychain`
**Cause:** `UseKeychain yes` is macOS-specific and not supported on Linux
**Solution:** Remove or comment out the line from SSH config
```bash
# UseKeychain yes # macOS only - remove on Linux
```
### Port Forwarding Conflicts
**Error:** `bind [127.0.0.1]:8080: Address already in use`
**Cause:** Local port already in use by another service
**Solutions:**
1. Remove LocalForward line from SSH config
2. Change to different port: `LocalForward 8081 localhost:80`
3. Find conflicting service: `sudo netstat -tulpn | grep :8080`
### Host Key Verification Loops
**Issue:** Asked to verify host key on every connection
**Cause:** SSH config discarding host keys with `UserKnownHostsFile /dev/null`
**Solution:** Change StrictHostKeyChecking policy
```bash
# Instead of:
StrictHostKeyChecking no
UserKnownHostsFile /dev/null
# Use:
StrictHostKeyChecking accept-new
```
## Key Deployment Issues
### ssh-copy-id False Warnings
**Warning:** `All keys were skipped because they already exist on the remote system`
**Issue:** Warning appears even when keys aren't actually deployed
**Solution:** Force deployment with `-f` flag
```bash
ssh-copy-id -f -i ~/.ssh/emergency_homelab_rsa.pub cal@10.10.0.42
```
### Permission Denied After Key Deployment
**Error:** `Permission denied (publickey)`
**Troubleshooting Steps:**
1. Check key permissions locally:
```bash
ls -la ~/.ssh/
# Private keys should be 600, public keys 644
```
2. Check authorized_keys on remote server:
```bash
ssh user@server "ls -la ~/.ssh/authorized_keys"
# Should be 600 with correct ownership
```
3. Verify key is actually deployed:
```bash
ssh user@server "cat ~/.ssh/authorized_keys"
```
4. Test specific key file:
```bash
ssh -i ~/.ssh/specific_key user@server
```
### Key Authentication Not Working
**Debug connection issues:**
```bash
# Verbose SSH connection for debugging
ssh -v user@server
# Super verbose for detailed debugging
ssh -vvv user@server
# Test specific identity file
ssh -i ~/.ssh/homelab_rsa -v cal@10.10.0.42
```
## Server-Side Issues
### SSH Service Not Running
**Check SSH service status:**
```bash
sudo systemctl status sshd
sudo systemctl start sshd
sudo systemctl enable sshd
```
### Firewall Blocking SSH
**Check firewall rules:**
```bash
# Ubuntu/Debian
sudo ufw status
sudo ufw allow ssh
# CentOS/RHEL
sudo firewall-cmd --list-services
sudo firewall-cmd --add-service=ssh --permanent
sudo firewall-cmd --reload
```
### Wrong SSH Port
**Check SSH configuration:**
```bash
sudo grep "^Port" /etc/ssh/sshd_config
# Update SSH client config accordingly
```
## Emergency Access Procedures
### Primary Keys Lost/Corrupted
1. **Use emergency keys:**
```bash
ssh -i ~/.ssh/emergency_homelab_rsa cal@10.10.0.16
```
2. **Restore from NAS backup:**
```bash
cp /mnt/NV2/ssh-keys/backup-*/homelab_rsa* ~/.ssh/
chmod 600 ~/.ssh/homelab_rsa
chmod 644 ~/.ssh/homelab_rsa.pub
```
3. **Generate new keys if needed:**
```bash
ssh-keygen -t rsa -b 4096 -f ~/.ssh/new_homelab_rsa
ssh-copy-id -i ~/.ssh/new_homelab_rsa.pub user@server
```
### Complete SSH Access Lost
1. **Physical/console access** (home servers)
2. **Cloud provider web console** (cloud servers)
3. **Recovery mode** if available
4. **Manual authorized_keys editing:**
```bash
# On the server via console:
echo "your-public-key-here" >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
```
## Network Connectivity Issues
### Connection Timeouts
**Check network connectivity:**
```bash
# Basic connectivity test
ping 10.10.0.42
# Check if SSH port is open
telnet 10.10.0.42 22
# Or using nc
nc -zv 10.10.0.42 22
```
### DNS Resolution Issues
**Bypass DNS with IP addresses:**
```bash
# Instead of hostname
ssh server.local
# Use IP directly
ssh 10.10.0.42
```
### VPN/Network Routing
**Check routing to server:**
```bash
traceroute 10.10.0.42
ip route | grep 10.10.0.0
```
## Configuration Validation
### SSH Config Syntax Check
```bash
# Test SSH config syntax
ssh -F ~/.ssh/config -T git@github.com 2>&1 | head
```
### Key Fingerprint Verification
```bash
# Local key fingerprint
ssh-keygen -lf ~/.ssh/homelab_rsa.pub
# Remote server's authorized keys fingerprints
ssh user@server "ssh-keygen -lf ~/.ssh/authorized_keys"
```
### Connection Test Script
```bash
#!/bin/bash
# Test all configured SSH hosts
for host in strat-database pihole docker-home akamai vultr; do
echo "Testing $host..."
if ssh -o ConnectTimeout=5 -o BatchMode=yes "$host" 'echo "OK"' 2>/dev/null; then
echo "✅ $host: Connected successfully"
else
echo "❌ $host: Connection failed"
fi
done
```
## Maintenance Commands
### Clean Up Known Hosts
```bash
# Remove specific host key
ssh-keygen -R 10.10.0.42
# Remove hostname and IP
ssh-keygen -R server.local
ssh-keygen -R 10.10.0.42
```
### Key Rotation Process
```bash
# Generate new key
ssh-keygen -t rsa -b 4096 -f ~/.ssh/homelab_rsa_new
# Deploy new key alongside old one
ssh-copy-id -i ~/.ssh/homelab_rsa_new.pub user@server
# Test new key works
ssh -i ~/.ssh/homelab_rsa_new user@server
# Update SSH config to use new key
# Remove old public key from server authorized_keys
# Archive old key pair
```
## Server-Specific Troubleshooting
### Home Lab Servers (10.10.0.x)
- **Physical access available** for recovery
- **Container hosts** may need different user contexts
- **Shared credentials** historically used (security risk)
### Cloud Servers
- **Provider console access** as fallback
- **Root user** typically used (create non-root users)
- **Different security contexts** than home network
## Related Documentation
- Patterns: `patterns/networking/ssh-key-management.md`
- Complete setup: `examples/networking/ssh-homelab-setup.md`