claude-home/productivity/n8n/troubleshooting.md

# n8n Troubleshooting Guide

Common issues and solutions for the n8n deployment at n8n.manticorum.com.

## Quick Diagnostics

**First steps when n8n isn't working:**

```bash
# Check container status
ssh root@10.10.0.210 "docker ps --filter name=n8n"

# Check logs for errors
ssh root@10.10.0.210 "cd /opt/n8n && docker compose logs --tail=50 n8n"

# Check if service is responding
curl -I http://10.10.0.210:5678

# Check NPM proxy status (if external access fails)
# Access NPM UI and check proxy host status
```

---

## Container Issues

### n8n Container Won't Start

**Symptoms:**
- Container exits immediately after starting
- `docker ps` shows no n8n container
- Error in logs: "database connection failed"

**Diagnosis:**
```bash
ssh root@10.10.0.210 "cd /opt/n8n && docker compose logs n8n | tail -30"
```

**Solutions:**

1. **Check PostgreSQL is healthy:**
   ```bash
   ssh root@10.10.0.210 "docker compose ps postgres"
   # Status should show "healthy"
   ```

2. **Verify database credentials in .env:**
   ```bash
   ssh root@10.10.0.210 "cat /opt/n8n/.env | grep POSTGRES"
   ```

3. **Restart both services:**
   ```bash
   ssh root@10.10.0.210 "cd /opt/n8n && docker compose down && docker compose up -d"
   ```

4. **Check database connectivity:**
   ```bash
   ssh root@10.10.0.210 "docker compose exec postgres psql -U n8n -d n8n -c 'SELECT 1;'"
   ```

### PostgreSQL Container Issues

**Symptoms:**
- n8n fails to connect to database
- PostgreSQL container shows "unhealthy" status

**Diagnosis:**
```bash
ssh root@10.10.0.210 "cd /opt/n8n && docker compose logs postgres | tail -50"
```

**Common Causes:**

1. **Corrupted database:**
   ```bash
   # Check database integrity
   ssh root@10.10.0.210 "docker compose exec postgres pg_isready -U n8n"
   ```

2. **Disk space full:**
   ```bash
   ssh root@10.10.0.210 "df -h /"
   # Should have >10GB free
   ```

3. **Permission issues:**
   ```bash
   ssh root@10.10.0.210 "docker volume inspect n8n_postgres_data"
   ```

**Recovery:**
```bash
# Restore from backup
ssh root@10.10.0.210 "
cd /opt/n8n
docker compose down
docker volume rm n8n_postgres_data
docker compose up -d postgres
# Wait for healthy status
cat /root/n8n-backup-YYYYMMDD.sql | docker compose exec -T postgres psql -U n8n n8n
docker compose up -d n8n
"
```

---

## Access Issues

### Can't Access n8n.manticorum.com

**Symptoms:**
- Browser shows "Connection timed out" or "Can't reach this page"
- Works on http://10.10.0.210:5678 but not via domain

**Diagnosis Steps:**

1. **Check DNS resolution:**
   ```bash
   nslookup n8n.manticorum.com
   # Should return your public IP
   ```

2. **Test internal access:**
   ```bash
   curl -I http://10.10.0.210:5678
   # Should return HTTP 200
   ```

3. **Check NPM proxy host:**
   - Login to NPM UI
   - Verify proxy host for n8n.manticorum.com exists
   - Check if status shows "online"

4. **Test NPM connectivity:**
   ```bash
   # From NPM host
   curl -I http://10.10.0.210:5678
   ```

**Solutions:**

1. **DNS not configured:**
   - Add A record: `n8n.manticorum.com` → `[your-public-IP]`
   - Wait for DNS propagation (up to 48 hours)

2. **NPM proxy host misconfigured:**
   - Domain: `n8n.manticorum.com`
   - Scheme: `http` (not https)
   - Forward Host: `10.10.0.210`
   - Forward Port: `5678`
   - ✅ Enable WebSockets Support

3. **Firewall blocking:**
   - Ensure ports 80 and 443 open on firewall
   - Check Proxmox firewall rules
   - Check LXC firewall if enabled

### SSL Certificate Issues

**Symptoms:**
- Browser shows "Your connection is not private"
- Certificate error in browser
- NPM shows "Certificate request failed"

**Diagnosis:**
```bash
# Test SSL
openssl s_client -connect n8n.manticorum.com:443 -servername n8n.manticorum.com

# Check certificate expiry
echo | openssl s_client -connect n8n.manticorum.com:443 2>/dev/null | openssl x509 -noout -dates
```

**Solutions:**

1. **Request new certificate in NPM:**
   - Edit proxy host
   - SSL tab → Request new SSL Certificate
   - Ensure email is correct
   - Check Let's Encrypt rate limits (5 per week)

2. **DNS validation failing:**
   - Verify domain points to correct IP
   - Ensure port 80 is accessible (Let's Encrypt uses HTTP validation)

3. **Use DNS challenge instead:**
   - If port 80 is blocked, use DNS challenge method in NPM
   - Requires API credentials for your DNS provider

### Login/Authentication Issues

**Symptoms:**
- Can access n8n but login fails
- "Invalid credentials" error
- Basic auth popup keeps appearing

**Diagnosis:**
```bash
# Check current credentials
ssh root@10.10.0.210 "cat /opt/n8n/.env | grep BASIC_AUTH"
```

**Solutions:**

1. **Reset admin password:**
   ```bash
   ssh root@10.10.0.210 "
   cd /opt/n8n
   # Generate new password
   NEW_PASS=\$(openssl rand -base64 16 | tr -d '/+=')
   echo \"New password: \$NEW_PASS\"
   # Update .env
   sed -i \"s/N8N_BASIC_AUTH_PASSWORD=.*/N8N_BASIC_AUTH_PASSWORD=\$NEW_PASS/\" .env
   # Restart
   docker compose restart n8n
   "
   ```

2. **Clear browser cache:**
   - Browser may cache old credentials
   - Try incognito/private window
   - Clear site data for n8n.manticorum.com

3. **Disable basic auth temporarily:**
   ```bash
   ssh root@10.10.0.210 "
   cd /opt/n8n
   sed -i 's/N8N_BASIC_AUTH_ACTIVE=true/N8N_BASIC_AUTH_ACTIVE=false/' .env
   docker compose restart n8n
   "
   ```
   **Warning:** Only do this for troubleshooting, re-enable immediately!

---

## Workflow Issues

### Webhooks Not Working

**Symptoms:**
- External services can't trigger workflows
- Webhook URL returns 404 or timeout
- Test webhooks work but production ones don't

**Diagnosis:**
```bash
# Test webhook URL
curl -X POST https://n8n.manticorum.com/webhook/test

# Check n8n logs for incoming requests
ssh root@10.10.0.210 "docker compose logs -f n8n | grep webhook"
```

**Common Causes:**

1. **Incorrect WEBHOOK_URL in configuration:**
   ```bash
   ssh root@10.10.0.210 "cat /opt/n8n/.env | grep WEBHOOK_URL"
   # Should be: https://n8n.manticorum.com/
   ```

2. **Workflow not activated:**
   - Check workflow is toggled "Active" in n8n UI
   - Look for green indicator on workflow

3. **NPM WebSocket support not enabled:**
   - Edit proxy host in NPM
   - Details tab → ✅ WebSockets Support

4. **Firewall blocking webhooks:**
   - Ensure external services can reach your public IP on port 443

**Solutions:**

```bash
# Update webhook URL
ssh root@10.10.0.210 "
cd /opt/n8n
sed -i 's|WEBHOOK_URL=.*|WEBHOOK_URL=https://n8n.manticorum.com/|' .env
docker compose restart n8n
"

# Test after restart
curl -X POST https://n8n.manticorum.com/webhook/test
```

### Executions Failing or Timing Out

**Symptoms:**
- Workflows start but never complete
- Timeout errors in execution logs
- Memory errors

**Diagnosis:**
```bash
# Check resource usage
ssh root@10.10.0.210 "docker stats --no-stream n8n"

# Check execution logs
# Access n8n UI → Executions → View failed execution
```

**Solutions:**

1. **Increase timeout in NPM:**
   - NPM proxy host → Advanced tab
   - Add: `proxy_read_timeout 300;`

2. **Increase LXC resources:**
   ```bash
   # On Proxmox host
   ssh root@10.10.0.11 "
   pct set 210 --memory 16384  # Increase to 16GB
   pct set 210 --cores 8        # Increase to 8 cores
   pct reboot 210
   "
   ```

3. **Optimize workflow:**
   - Break large workflows into smaller ones
   - Use pagination for API calls
   - Add delay nodes between heavy operations

4. **Check external service timeouts:**
   - API you're calling may be slow
   - Increase timeout in HTTP Request nodes

### Database/Credential Issues

**Symptoms:**
- "Error loading credentials" in workflow
- Saved credentials not appearing
- "Credentials could not be decrypted"

**Critical Error - Encryption Key Changed:**

If you see "could not be decrypted," the encryption key was changed or is incorrect.

**This is UNRECOVERABLE without the original key!**

```bash
# Check current encryption key
ssh root@10.10.0.210 "cat /opt/n8n/.env | grep N8N_ENCRYPTION_KEY"

# If you have the old key, restore it:
ssh root@10.10.0.210 "
cd /opt/n8n
sed -i 's/N8N_ENCRYPTION_KEY=.*/N8N_ENCRYPTION_KEY=YOUR_OLD_KEY/' .env
docker compose restart n8n
"
```

**Prevention:**
- Backup `.env` file regularly
- Store encryption key in password manager
- Never regenerate encryption key after initial setup

---

## Performance Issues

### n8n Running Slow

**Symptoms:**
- UI sluggish or unresponsive
- Workflows take longer than expected
- High CPU/memory usage

**Diagnosis:**
```bash
# Check resource usage
ssh root@10.10.0.210 "
docker stats n8n n8n-postgres
df -h /
free -h
"

# Check PostgreSQL performance
ssh root@10.10.0.210 "
docker compose exec postgres psql -U n8n -d n8n -c '
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;'
"
```

**Solutions:**

1. **Clean up old executions:**
   ```bash
   # In n8n UI: Settings → Executions
   # Set: "Delete executions older than X days"
   ```

2. **Optimize database:**
   ```bash
   ssh root@10.10.0.210 "
   docker compose exec postgres psql -U n8n -d n8n -c 'VACUUM ANALYZE;'
   "
   ```

3. **Increase LXC resources** (see above)

4. **Check disk I/O:**
   ```bash
   ssh root@10.10.0.210 "iostat -x 1 5"
   # If %util is consistently >80%, consider faster storage
   ```

### Database Growing Too Large

**Symptoms:**
- Disk space warning
- n8n slowing down over time
- Backup files becoming huge

**Diagnosis:**
```bash
# Check database size
ssh root@10.10.0.210 "
docker compose exec postgres psql -U n8n -d n8n -c '
SELECT pg_size_pretty(pg_database_size(current_database()));'
"

# Check table sizes
ssh root@10.10.0.210 "
docker compose exec postgres psql -U n8n -d n8n -c '
SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::text))
FROM pg_tables
WHERE schemaname = '\''public'\''
ORDER BY pg_total_relation_size(tablename::text) DESC;'
"
```

**Solutions:**

1. **Configure execution pruning:**
   - Settings → Executions
   - Enable: "Delete executions older than 7 days"
   - Set: "Max execution data to save"

2. **Manual cleanup:**
   ```bash
   ssh root@10.10.0.210 "
   docker compose exec postgres psql -U n8n -d n8n -c '
   DELETE FROM execution_entity
   WHERE \"startedAt\" < NOW() - INTERVAL '\''30 days'\'';
   VACUUM FULL;'
   "
   ```

---

## Emergency Procedures

### Complete Service Restart

```bash
ssh root@10.10.0.210 "cd /opt/n8n && docker compose down && docker compose up -d"
```

### Emergency Backup Before Changes

```bash
ssh root@10.10.0.210 "
cd /opt/n8n
# Create emergency backup
docker compose exec -T postgres pg_dump -U n8n n8n > /root/n8n-emergency-$(date +%Y%m%d-%H%M%S).sql
# Copy .env
cp .env /root/n8n-env-emergency-$(date +%Y%m%d-%H%M%S).env
"
```

### Complete Reset (DESTRUCTIVE)

**Only if all else fails and you're okay losing workflows:**

```bash
ssh root@10.10.0.210 "
cd /opt/n8n
docker compose down
docker volume rm n8n_data n8n_postgres_data
docker compose up -d
"
```

**Note:** This deletes everything. Restore from backup immediately after!

---

## Prevention & Monitoring

### Regular Maintenance

**Weekly:**
- Check disk space: `df -h /`
- Review failed executions in n8n UI
- Check log for errors: `docker compose logs --since 7d`

**Monthly:**
- Backup database and .env file
- Update n8n: `docker compose pull && docker compose up -d`
- Vacuum database: `VACUUM ANALYZE;`
- Review execution data retention settings

**Quarterly:**
- Test disaster recovery procedure
- Review and archive old workflows
- Audit credentials and remove unused ones
- Check for security updates

### Monitoring Setup

**Basic health check script:**
```bash
#!/bin/bash
# /opt/monitoring/check-n8n.sh

STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://10.10.0.210:5678)

if [ "$STATUS" != "200" ]; then
    echo "❌ n8n is down! Status: $STATUS"
    # Send alert (Discord, email, etc.)
else
    echo "✅ n8n is healthy"
fi
```

**Add to cron:**
```bash
*/5 * * * * /opt/monitoring/check-n8n.sh >> /var/log/n8n-health.log 2>&1
```

---

## Getting Help

### Log Collection for Support

```bash
# Collect all relevant logs
ssh root@10.10.0.210 "
cd /opt/n8n
mkdir -p /tmp/n8n-debug
docker compose logs --tail=200 > /tmp/n8n-debug/docker-logs.txt
docker compose ps > /tmp/n8n-debug/container-status.txt
cat .env | sed 's/PASSWORD=.*/PASSWORD=***/' > /tmp/n8n-debug/env-redacted.txt
df -h > /tmp/n8n-debug/disk-space.txt
free -h > /tmp/n8n-debug/memory.txt
docker stats --no-stream > /tmp/n8n-debug/container-stats.txt
tar -czf /root/n8n-debug-$(date +%Y%m%d-%H%M%S).tar.gz /tmp/n8n-debug/
"
```

### Resources

- **n8n Community Forum:** https://community.n8n.io/
- **Official Docs:** https://docs.n8n.io/
- **GitHub Issues:** https://github.com/n8n-io/n8n/issues
- **Discord:** https://discord.gg/n8n

### When to Escalate

Escalate to n8n community/support if:
- Database corruption suspected
- Consistent crashes with no clear cause
- Performance issues persist after optimization
- Security concerns
- Bug suspected in n8n itself

Always provide:
- n8n version: `docker inspect n8n | grep Image`
- Error messages from logs
- Steps to reproduce
- What you've already tried