Comprehensive guide documenting the investigation and resolution of intermittent WebSocket connection failures: - NPM configuration issues (HTTP/2, access lists, custom locations) - Conflicting socket.io plugin (JWT vs cookie auth) - SSR/hydration state corruption (primary root cause) - Diagnostic commands and debugging tips - SSR-safe patterns for WebSocket in Nuxt - Cloudflare considerations This document serves as future reference for similar WebSocket issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
228 lines
7.5 KiB
Markdown
228 lines
7.5 KiB
Markdown
# WebSocket Connection Troubleshooting Guide
|
|
|
|
**Last Updated**: 2025-01-29
|
|
**Issue Resolved**: Intermittent WebSocket connection failures through reverse proxy
|
|
|
|
## Problem Statement
|
|
|
|
WebSocket connections would intermittently fail when accessing the application through the production URL (`gameplay-demo.manticorum.com`). The connection would show "Socket exists: no" and the browser wouldn't even attempt to make a WebSocket connection. Mysteriously, adding debug code or making minor changes would cause the connection to "randomly start working."
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
Browser → Cloudflare → Nginx Proxy Manager → Backend (FastAPI + Socket.io)
|
|
↓
|
|
Port 8000
|
|
```
|
|
|
|
## Root Causes Identified
|
|
|
|
### 1. Nginx Proxy Manager Configuration Issues
|
|
|
|
**Symptoms**:
|
|
- Direct connection to backend (localhost:8000) returned HTTP 101 (success)
|
|
- Connection through NPM returned HTTP 400 or 500
|
|
|
|
**Causes Found**:
|
|
- HTTP/2 stripping WebSocket upgrade headers (Cloudflare enables HTTP/2 by default)
|
|
- Access lists blocking non-Cloudflare IPs when Cloudflare proxy was disabled for testing
|
|
- Custom location blocks not inheriting WebSocket headers from server-level config
|
|
|
|
**Solution**:
|
|
- Disable HTTP/2 in NPM for WebSocket hosts (or use Cloudflare proxy which handles this)
|
|
- Remove restrictive access lists during development
|
|
- Use NPM's built-in "WebSocket Support" toggle rather than manual header configuration
|
|
- Route `/socket.io` to port 8000 via NPM GUI custom location
|
|
|
|
### 2. Conflicting Socket.io Plugin
|
|
|
|
**Symptoms**:
|
|
- "Socket exists: no" in UI debug output
|
|
- No `/socket.io` network requests appearing in logs
|
|
- Browser not even attempting WebSocket connection
|
|
|
|
**Cause**:
|
|
File `frontend-sba/plugins/socket.client.ts` was using JWT token authentication, conflicting with the cookie-based authentication in `useWebSocket.ts`. Both were trying to manage the socket connection.
|
|
|
|
**Solution**:
|
|
Deleted/disabled `plugins/socket.client.ts`. The `useWebSocket.ts` composable handles all WebSocket management with cookie-based auth.
|
|
|
|
### 3. SSR/Hydration State Corruption (Primary Root Cause)
|
|
|
|
**Symptoms**:
|
|
- Connection would "randomly work" after adding debug code or making changes
|
|
- No functional difference in code changes, but behavior changed
|
|
- Intermittent failures that seemed timing-dependent
|
|
|
|
**Cause**:
|
|
Module-level singleton state in `useWebSocket.ts` was being initialized during SSR (server-side rendering), then persisting in a corrupted state during client hydration. The state variables:
|
|
|
|
```typescript
|
|
// BEFORE (problematic)
|
|
let socketInstance: Socket | null = null
|
|
let reconnectionAttempts = 0
|
|
let reconnectionTimeout: NodeJS.Timeout | null = null
|
|
```
|
|
|
|
These were initialized on the server during SSR, potentially set to intermediate values, then the same state object was "reused" on the client during hydration instead of being freshly initialized.
|
|
|
|
**Solution**:
|
|
Refactored to use lazy client-only initialization:
|
|
|
|
```typescript
|
|
// AFTER (fixed)
|
|
interface ClientState {
|
|
socketInstance: Socket | null
|
|
reconnectionAttempts: number
|
|
reconnectionTimeout: ReturnType<typeof setTimeout> | null
|
|
// ... other state
|
|
initialized: boolean
|
|
}
|
|
|
|
let clientState: ClientState | null = null
|
|
|
|
function getClientState(): ClientState {
|
|
if (import.meta.client && !clientState) {
|
|
clientState = { /* fresh initialization */ }
|
|
}
|
|
return clientState || { /* empty fallback for SSR */ }
|
|
}
|
|
|
|
// Reset reactive state on client hydration
|
|
if (import.meta.client) {
|
|
isConnected.value = false
|
|
isConnecting.value = false
|
|
// ...
|
|
}
|
|
```
|
|
|
|
Added `import.meta.client` guards to all functions that interact with the socket.
|
|
|
|
## Diagnostic Commands
|
|
|
|
### Test Direct Backend Connection
|
|
```bash
|
|
curl -v -H "Upgrade: websocket" \
|
|
-H "Connection: Upgrade" \
|
|
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
|
|
-H "Sec-WebSocket-Version: 13" \
|
|
"http://localhost:8000/socket.io/?EIO=4&transport=websocket"
|
|
# Should return HTTP 101 Switching Protocols
|
|
```
|
|
|
|
### Test Through Proxy
|
|
```bash
|
|
curl -v -H "Upgrade: websocket" \
|
|
-H "Connection: Upgrade" \
|
|
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
|
|
-H "Sec-WebSocket-Version: 13" \
|
|
"https://gameplay-demo.manticorum.com/socket.io/?EIO=4&transport=websocket"
|
|
# Should also return HTTP 101
|
|
```
|
|
|
|
### Check NPM Configuration
|
|
```bash
|
|
ssh homelab # or your NPM server
|
|
cat /data/nginx/proxy_host/*.conf | grep -A 20 "gameplay-demo"
|
|
```
|
|
|
|
### Monitor Backend Logs
|
|
```bash
|
|
tail -f /tmp/backend_live.log | grep -i socket
|
|
```
|
|
|
|
## NPM Configuration for WebSocket
|
|
|
|
Working configuration for `gameplay-demo.manticorum.com`:
|
|
|
|
1. **Proxy Host Settings**:
|
|
- Forward Hostname: `10.10.0.16` (or your backend host)
|
|
- Forward Port: `8000`
|
|
- WebSocket Support: **ENABLED** (toggle in GUI)
|
|
- HTTP/2 Support: **DISABLED** (for WebSocket compatibility)
|
|
|
|
2. **Custom Location** (via GUI, not Advanced config):
|
|
- Location: `/socket.io`
|
|
- Forward Host: `10.10.0.16`
|
|
- Forward Port: `8000`
|
|
|
|
3. **No Access Lists** blocking during development
|
|
|
|
## Key Files
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `frontend-sba/composables/useWebSocket.ts` | Main WebSocket management with SSR-safe patterns |
|
|
| `frontend-sba/composables/useGameActions.ts` | Game action wrappers using the socket |
|
|
| `backend/app/websocket/handlers.py` | Backend Socket.io event handlers |
|
|
| `backend/app/websocket/auth.py` | Cookie-based authentication for WebSocket |
|
|
|
|
## SSR-Safe Patterns for WebSocket in Nuxt
|
|
|
|
### DO:
|
|
```typescript
|
|
// Lazy initialization only on client
|
|
function getClientState(): ClientState {
|
|
if (import.meta.client && !clientState) {
|
|
clientState = { /* init */ }
|
|
}
|
|
return clientState || { /* fallback */ }
|
|
}
|
|
|
|
// Guard all socket operations
|
|
function connect() {
|
|
if (!import.meta.client) return
|
|
// ...
|
|
}
|
|
|
|
// Reset state on hydration
|
|
if (import.meta.client) {
|
|
isConnected.value = false
|
|
}
|
|
|
|
// Use ReturnType for timer types (SSR compatible)
|
|
let timeout: ReturnType<typeof setTimeout> | null = null
|
|
```
|
|
|
|
### DON'T:
|
|
```typescript
|
|
// Module-level initialization (runs on server!)
|
|
let socket = io(url) // BAD
|
|
|
|
// NodeJS types (not available in browser)
|
|
let timeout: NodeJS.Timeout // BAD
|
|
|
|
// Immediate watchers without guards
|
|
watch(() => auth, () => connect(), { immediate: true }) // BAD
|
|
```
|
|
|
|
## Cloudflare Considerations
|
|
|
|
- Cloudflare Pro/Business/Enterprise can disable HTTP/2 per-domain
|
|
- Free plan cannot disable HTTP/2, but Cloudflare proxy handles WebSocket upgrade correctly
|
|
- When testing without Cloudflare proxy (orange cloud off), ensure NPM handles HTTP/2 → HTTP/1.1 downgrade
|
|
- WebSocket connections through Cloudflare have a 100-second idle timeout
|
|
|
|
## Debugging Tips
|
|
|
|
1. **Check if socket instance exists**: Add `console.log('[WebSocket] Socket exists:', !!state.socketInstance)`
|
|
|
|
2. **Verify SSR vs Client**: Add `console.log('[WebSocket] Running on:', import.meta.client ? 'client' : 'server')`
|
|
|
|
3. **Monitor hydration**: Log in module scope `if (import.meta.client) { console.log('Module loaded on client') }`
|
|
|
|
4. **Check network tab**: Filter by "WS" to see WebSocket connections, look for 101 status
|
|
|
|
5. **Backend logs**: Look for "Socket.io connection" messages to confirm backend receives the connection
|
|
|
|
## Related Commits
|
|
|
|
- `CLAUDE: Improve service scripts and fix WebSocket plugin conflict` - Removed conflicting plugin
|
|
- `CLAUDE: Fix SSR/hydration issues in WebSocket composable` - Main SSR fix
|
|
|
|
## References
|
|
|
|
- [Nuxt 3 SSR Documentation](https://nuxt.com/docs/guide/concepts/rendering)
|
|
- [Socket.io Client Documentation](https://socket.io/docs/v4/client-initialization/)
|
|
- [Nginx Proxy Manager WebSocket Support](https://nginxproxymanager.com/advanced-config/)
|