# WebSocket Connection Troubleshooting Guide **Last Updated**: 2025-01-29 **Issue Resolved**: Intermittent WebSocket connection failures through reverse proxy ## Problem Statement WebSocket connections would intermittently fail when accessing the application through the production URL (`gameplay-demo.manticorum.com`). The connection would show "Socket exists: no" and the browser wouldn't even attempt to make a WebSocket connection. Mysteriously, adding debug code or making minor changes would cause the connection to "randomly start working." ## Architecture Overview ``` Browser → Cloudflare → Nginx Proxy Manager → Backend (FastAPI + Socket.io) ↓ Port 8000 ``` ## Root Causes Identified ### 1. Nginx Proxy Manager Configuration Issues **Symptoms**: - Direct connection to backend (localhost:8000) returned HTTP 101 (success) - Connection through NPM returned HTTP 400 or 500 **Causes Found**: - HTTP/2 stripping WebSocket upgrade headers (Cloudflare enables HTTP/2 by default) - Access lists blocking non-Cloudflare IPs when Cloudflare proxy was disabled for testing - Custom location blocks not inheriting WebSocket headers from server-level config **Solution**: - Disable HTTP/2 in NPM for WebSocket hosts (or use Cloudflare proxy which handles this) - Remove restrictive access lists during development - Use NPM's built-in "WebSocket Support" toggle rather than manual header configuration - Route `/socket.io` to port 8000 via NPM GUI custom location ### 2. Conflicting Socket.io Plugin **Symptoms**: - "Socket exists: no" in UI debug output - No `/socket.io` network requests appearing in logs - Browser not even attempting WebSocket connection **Cause**: File `frontend-sba/plugins/socket.client.ts` was using JWT token authentication, conflicting with the cookie-based authentication in `useWebSocket.ts`. Both were trying to manage the socket connection. **Solution**: Deleted/disabled `plugins/socket.client.ts`. The `useWebSocket.ts` composable handles all WebSocket management with cookie-based auth. ### 3. SSR/Hydration State Corruption (Primary Root Cause) **Symptoms**: - Connection would "randomly work" after adding debug code or making changes - No functional difference in code changes, but behavior changed - Intermittent failures that seemed timing-dependent **Cause**: Module-level singleton state in `useWebSocket.ts` was being initialized during SSR (server-side rendering), then persisting in a corrupted state during client hydration. The state variables: ```typescript // BEFORE (problematic) let socketInstance: Socket | null = null let reconnectionAttempts = 0 let reconnectionTimeout: NodeJS.Timeout | null = null ``` These were initialized on the server during SSR, potentially set to intermediate values, then the same state object was "reused" on the client during hydration instead of being freshly initialized. **Solution**: Refactored to use lazy client-only initialization: ```typescript // AFTER (fixed) interface ClientState { socketInstance: Socket | null reconnectionAttempts: number reconnectionTimeout: ReturnType | null // ... other state initialized: boolean } let clientState: ClientState | null = null function getClientState(): ClientState { if (import.meta.client && !clientState) { clientState = { /* fresh initialization */ } } return clientState || { /* empty fallback for SSR */ } } // Reset reactive state on client hydration if (import.meta.client) { isConnected.value = false isConnecting.value = false // ... } ``` Added `import.meta.client` guards to all functions that interact with the socket. ## Diagnostic Commands ### Test Direct Backend Connection ```bash curl -v -H "Upgrade: websocket" \ -H "Connection: Upgrade" \ -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \ -H "Sec-WebSocket-Version: 13" \ "http://localhost:8000/socket.io/?EIO=4&transport=websocket" # Should return HTTP 101 Switching Protocols ``` ### Test Through Proxy ```bash curl -v -H "Upgrade: websocket" \ -H "Connection: Upgrade" \ -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \ -H "Sec-WebSocket-Version: 13" \ "https://gameplay-demo.manticorum.com/socket.io/?EIO=4&transport=websocket" # Should also return HTTP 101 ``` ### Check NPM Configuration ```bash ssh homelab # or your NPM server cat /data/nginx/proxy_host/*.conf | grep -A 20 "gameplay-demo" ``` ### Monitor Backend Logs ```bash tail -f /tmp/backend_live.log | grep -i socket ``` ## NPM Configuration for WebSocket Working configuration for `gameplay-demo.manticorum.com`: 1. **Proxy Host Settings**: - Forward Hostname: `10.10.0.16` (or your backend host) - Forward Port: `8000` - WebSocket Support: **ENABLED** (toggle in GUI) - HTTP/2 Support: **DISABLED** (for WebSocket compatibility) 2. **Custom Location** (via GUI, not Advanced config): - Location: `/socket.io` - Forward Host: `10.10.0.16` - Forward Port: `8000` 3. **No Access Lists** blocking during development ## Key Files | File | Purpose | |------|---------| | `frontend-sba/composables/useWebSocket.ts` | Main WebSocket management with SSR-safe patterns | | `frontend-sba/composables/useGameActions.ts` | Game action wrappers using the socket | | `backend/app/websocket/handlers.py` | Backend Socket.io event handlers | | `backend/app/websocket/auth.py` | Cookie-based authentication for WebSocket | ## SSR-Safe Patterns for WebSocket in Nuxt ### DO: ```typescript // Lazy initialization only on client function getClientState(): ClientState { if (import.meta.client && !clientState) { clientState = { /* init */ } } return clientState || { /* fallback */ } } // Guard all socket operations function connect() { if (!import.meta.client) return // ... } // Reset state on hydration if (import.meta.client) { isConnected.value = false } // Use ReturnType for timer types (SSR compatible) let timeout: ReturnType | null = null ``` ### DON'T: ```typescript // Module-level initialization (runs on server!) let socket = io(url) // BAD // NodeJS types (not available in browser) let timeout: NodeJS.Timeout // BAD // Immediate watchers without guards watch(() => auth, () => connect(), { immediate: true }) // BAD ``` ## Cloudflare Considerations - Cloudflare Pro/Business/Enterprise can disable HTTP/2 per-domain - Free plan cannot disable HTTP/2, but Cloudflare proxy handles WebSocket upgrade correctly - When testing without Cloudflare proxy (orange cloud off), ensure NPM handles HTTP/2 → HTTP/1.1 downgrade - WebSocket connections through Cloudflare have a 100-second idle timeout ## Debugging Tips 1. **Check if socket instance exists**: Add `console.log('[WebSocket] Socket exists:', !!state.socketInstance)` 2. **Verify SSR vs Client**: Add `console.log('[WebSocket] Running on:', import.meta.client ? 'client' : 'server')` 3. **Monitor hydration**: Log in module scope `if (import.meta.client) { console.log('Module loaded on client') }` 4. **Check network tab**: Filter by "WS" to see WebSocket connections, look for 101 status 5. **Backend logs**: Look for "Socket.io connection" messages to confirm backend receives the connection ## Related Commits - `CLAUDE: Improve service scripts and fix WebSocket plugin conflict` - Removed conflicting plugin - `CLAUDE: Fix SSR/hydration issues in WebSocket composable` - Main SSR fix ## References - [Nuxt 3 SSR Documentation](https://nuxt.com/docs/guide/concepts/rendering) - [Socket.io Client Documentation](https://socket.io/docs/v4/client-initialization/) - [Nginx Proxy Manager WebSocket Support](https://nginxproxymanager.com/advanced-config/)