Comprehensive guide documenting the investigation and resolution of intermittent WebSocket connection failures: - NPM configuration issues (HTTP/2, access lists, custom locations) - Conflicting socket.io plugin (JWT vs cookie auth) - SSR/hydration state corruption (primary root cause) - Diagnostic commands and debugging tips - SSR-safe patterns for WebSocket in Nuxt - Cloudflare considerations This document serves as future reference for similar WebSocket issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
7.5 KiB
WebSocket Connection Troubleshooting Guide
Last Updated: 2025-01-29 Issue Resolved: Intermittent WebSocket connection failures through reverse proxy
Problem Statement
WebSocket connections would intermittently fail when accessing the application through the production URL (gameplay-demo.manticorum.com). The connection would show "Socket exists: no" and the browser wouldn't even attempt to make a WebSocket connection. Mysteriously, adding debug code or making minor changes would cause the connection to "randomly start working."
Architecture Overview
Browser → Cloudflare → Nginx Proxy Manager → Backend (FastAPI + Socket.io)
↓
Port 8000
Root Causes Identified
1. Nginx Proxy Manager Configuration Issues
Symptoms:
- Direct connection to backend (localhost:8000) returned HTTP 101 (success)
- Connection through NPM returned HTTP 400 or 500
Causes Found:
- HTTP/2 stripping WebSocket upgrade headers (Cloudflare enables HTTP/2 by default)
- Access lists blocking non-Cloudflare IPs when Cloudflare proxy was disabled for testing
- Custom location blocks not inheriting WebSocket headers from server-level config
Solution:
- Disable HTTP/2 in NPM for WebSocket hosts (or use Cloudflare proxy which handles this)
- Remove restrictive access lists during development
- Use NPM's built-in "WebSocket Support" toggle rather than manual header configuration
- Route
/socket.ioto port 8000 via NPM GUI custom location
2. Conflicting Socket.io Plugin
Symptoms:
- "Socket exists: no" in UI debug output
- No
/socket.ionetwork requests appearing in logs - Browser not even attempting WebSocket connection
Cause:
File frontend-sba/plugins/socket.client.ts was using JWT token authentication, conflicting with the cookie-based authentication in useWebSocket.ts. Both were trying to manage the socket connection.
Solution:
Deleted/disabled plugins/socket.client.ts. The useWebSocket.ts composable handles all WebSocket management with cookie-based auth.
3. SSR/Hydration State Corruption (Primary Root Cause)
Symptoms:
- Connection would "randomly work" after adding debug code or making changes
- No functional difference in code changes, but behavior changed
- Intermittent failures that seemed timing-dependent
Cause:
Module-level singleton state in useWebSocket.ts was being initialized during SSR (server-side rendering), then persisting in a corrupted state during client hydration. The state variables:
// BEFORE (problematic)
let socketInstance: Socket | null = null
let reconnectionAttempts = 0
let reconnectionTimeout: NodeJS.Timeout | null = null
These were initialized on the server during SSR, potentially set to intermediate values, then the same state object was "reused" on the client during hydration instead of being freshly initialized.
Solution: Refactored to use lazy client-only initialization:
// AFTER (fixed)
interface ClientState {
socketInstance: Socket | null
reconnectionAttempts: number
reconnectionTimeout: ReturnType<typeof setTimeout> | null
// ... other state
initialized: boolean
}
let clientState: ClientState | null = null
function getClientState(): ClientState {
if (import.meta.client && !clientState) {
clientState = { /* fresh initialization */ }
}
return clientState || { /* empty fallback for SSR */ }
}
// Reset reactive state on client hydration
if (import.meta.client) {
isConnected.value = false
isConnecting.value = false
// ...
}
Added import.meta.client guards to all functions that interact with the socket.
Diagnostic Commands
Test Direct Backend Connection
curl -v -H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
-H "Sec-WebSocket-Version: 13" \
"http://localhost:8000/socket.io/?EIO=4&transport=websocket"
# Should return HTTP 101 Switching Protocols
Test Through Proxy
curl -v -H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
-H "Sec-WebSocket-Version: 13" \
"https://gameplay-demo.manticorum.com/socket.io/?EIO=4&transport=websocket"
# Should also return HTTP 101
Check NPM Configuration
ssh homelab # or your NPM server
cat /data/nginx/proxy_host/*.conf | grep -A 20 "gameplay-demo"
Monitor Backend Logs
tail -f /tmp/backend_live.log | grep -i socket
NPM Configuration for WebSocket
Working configuration for gameplay-demo.manticorum.com:
-
Proxy Host Settings:
- Forward Hostname:
10.10.0.16(or your backend host) - Forward Port:
8000 - WebSocket Support: ENABLED (toggle in GUI)
- HTTP/2 Support: DISABLED (for WebSocket compatibility)
- Forward Hostname:
-
Custom Location (via GUI, not Advanced config):
- Location:
/socket.io - Forward Host:
10.10.0.16 - Forward Port:
8000
- Location:
-
No Access Lists blocking during development
Key Files
| File | Purpose |
|---|---|
frontend-sba/composables/useWebSocket.ts |
Main WebSocket management with SSR-safe patterns |
frontend-sba/composables/useGameActions.ts |
Game action wrappers using the socket |
backend/app/websocket/handlers.py |
Backend Socket.io event handlers |
backend/app/websocket/auth.py |
Cookie-based authentication for WebSocket |
SSR-Safe Patterns for WebSocket in Nuxt
DO:
// Lazy initialization only on client
function getClientState(): ClientState {
if (import.meta.client && !clientState) {
clientState = { /* init */ }
}
return clientState || { /* fallback */ }
}
// Guard all socket operations
function connect() {
if (!import.meta.client) return
// ...
}
// Reset state on hydration
if (import.meta.client) {
isConnected.value = false
}
// Use ReturnType for timer types (SSR compatible)
let timeout: ReturnType<typeof setTimeout> | null = null
DON'T:
// Module-level initialization (runs on server!)
let socket = io(url) // BAD
// NodeJS types (not available in browser)
let timeout: NodeJS.Timeout // BAD
// Immediate watchers without guards
watch(() => auth, () => connect(), { immediate: true }) // BAD
Cloudflare Considerations
- Cloudflare Pro/Business/Enterprise can disable HTTP/2 per-domain
- Free plan cannot disable HTTP/2, but Cloudflare proxy handles WebSocket upgrade correctly
- When testing without Cloudflare proxy (orange cloud off), ensure NPM handles HTTP/2 → HTTP/1.1 downgrade
- WebSocket connections through Cloudflare have a 100-second idle timeout
Debugging Tips
-
Check if socket instance exists: Add
console.log('[WebSocket] Socket exists:', !!state.socketInstance) -
Verify SSR vs Client: Add
console.log('[WebSocket] Running on:', import.meta.client ? 'client' : 'server') -
Monitor hydration: Log in module scope
if (import.meta.client) { console.log('Module loaded on client') } -
Check network tab: Filter by "WS" to see WebSocket connections, look for 101 status
-
Backend logs: Look for "Socket.io connection" messages to confirm backend receives the connection
Related Commits
CLAUDE: Improve service scripts and fix WebSocket plugin conflict- Removed conflicting pluginCLAUDE: Fix SSR/hydration issues in WebSocket composable- Main SSR fix