strat-gameplay-webapp/.claude/WEBSOCKET_TROUBLESHOOTING.md
Cal Corum db667965e6 CLAUDE: Add WebSocket troubleshooting documentation
Comprehensive guide documenting the investigation and resolution of
intermittent WebSocket connection failures:

- NPM configuration issues (HTTP/2, access lists, custom locations)
- Conflicting socket.io plugin (JWT vs cookie auth)
- SSR/hydration state corruption (primary root cause)
- Diagnostic commands and debugging tips
- SSR-safe patterns for WebSocket in Nuxt
- Cloudflare considerations

This document serves as future reference for similar WebSocket issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 15:32:30 -06:00

7.5 KiB

WebSocket Connection Troubleshooting Guide

Last Updated: 2025-01-29 Issue Resolved: Intermittent WebSocket connection failures through reverse proxy

Problem Statement

WebSocket connections would intermittently fail when accessing the application through the production URL (gameplay-demo.manticorum.com). The connection would show "Socket exists: no" and the browser wouldn't even attempt to make a WebSocket connection. Mysteriously, adding debug code or making minor changes would cause the connection to "randomly start working."

Architecture Overview

Browser → Cloudflare → Nginx Proxy Manager → Backend (FastAPI + Socket.io)
                         ↓
                    Port 8000

Root Causes Identified

1. Nginx Proxy Manager Configuration Issues

Symptoms:

  • Direct connection to backend (localhost:8000) returned HTTP 101 (success)
  • Connection through NPM returned HTTP 400 or 500

Causes Found:

  • HTTP/2 stripping WebSocket upgrade headers (Cloudflare enables HTTP/2 by default)
  • Access lists blocking non-Cloudflare IPs when Cloudflare proxy was disabled for testing
  • Custom location blocks not inheriting WebSocket headers from server-level config

Solution:

  • Disable HTTP/2 in NPM for WebSocket hosts (or use Cloudflare proxy which handles this)
  • Remove restrictive access lists during development
  • Use NPM's built-in "WebSocket Support" toggle rather than manual header configuration
  • Route /socket.io to port 8000 via NPM GUI custom location

2. Conflicting Socket.io Plugin

Symptoms:

  • "Socket exists: no" in UI debug output
  • No /socket.io network requests appearing in logs
  • Browser not even attempting WebSocket connection

Cause: File frontend-sba/plugins/socket.client.ts was using JWT token authentication, conflicting with the cookie-based authentication in useWebSocket.ts. Both were trying to manage the socket connection.

Solution: Deleted/disabled plugins/socket.client.ts. The useWebSocket.ts composable handles all WebSocket management with cookie-based auth.

3. SSR/Hydration State Corruption (Primary Root Cause)

Symptoms:

  • Connection would "randomly work" after adding debug code or making changes
  • No functional difference in code changes, but behavior changed
  • Intermittent failures that seemed timing-dependent

Cause: Module-level singleton state in useWebSocket.ts was being initialized during SSR (server-side rendering), then persisting in a corrupted state during client hydration. The state variables:

// BEFORE (problematic)
let socketInstance: Socket | null = null
let reconnectionAttempts = 0
let reconnectionTimeout: NodeJS.Timeout | null = null

These were initialized on the server during SSR, potentially set to intermediate values, then the same state object was "reused" on the client during hydration instead of being freshly initialized.

Solution: Refactored to use lazy client-only initialization:

// AFTER (fixed)
interface ClientState {
  socketInstance: Socket | null
  reconnectionAttempts: number
  reconnectionTimeout: ReturnType<typeof setTimeout> | null
  // ... other state
  initialized: boolean
}

let clientState: ClientState | null = null

function getClientState(): ClientState {
  if (import.meta.client && !clientState) {
    clientState = { /* fresh initialization */ }
  }
  return clientState || { /* empty fallback for SSR */ }
}

// Reset reactive state on client hydration
if (import.meta.client) {
  isConnected.value = false
  isConnecting.value = false
  // ...
}

Added import.meta.client guards to all functions that interact with the socket.

Diagnostic Commands

Test Direct Backend Connection

curl -v -H "Upgrade: websocket" \
     -H "Connection: Upgrade" \
     -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
     -H "Sec-WebSocket-Version: 13" \
     "http://localhost:8000/socket.io/?EIO=4&transport=websocket"
# Should return HTTP 101 Switching Protocols

Test Through Proxy

curl -v -H "Upgrade: websocket" \
     -H "Connection: Upgrade" \
     -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
     -H "Sec-WebSocket-Version: 13" \
     "https://gameplay-demo.manticorum.com/socket.io/?EIO=4&transport=websocket"
# Should also return HTTP 101

Check NPM Configuration

ssh homelab  # or your NPM server
cat /data/nginx/proxy_host/*.conf | grep -A 20 "gameplay-demo"

Monitor Backend Logs

tail -f /tmp/backend_live.log | grep -i socket

NPM Configuration for WebSocket

Working configuration for gameplay-demo.manticorum.com:

  1. Proxy Host Settings:

    • Forward Hostname: 10.10.0.16 (or your backend host)
    • Forward Port: 8000
    • WebSocket Support: ENABLED (toggle in GUI)
    • HTTP/2 Support: DISABLED (for WebSocket compatibility)
  2. Custom Location (via GUI, not Advanced config):

    • Location: /socket.io
    • Forward Host: 10.10.0.16
    • Forward Port: 8000
  3. No Access Lists blocking during development

Key Files

File Purpose
frontend-sba/composables/useWebSocket.ts Main WebSocket management with SSR-safe patterns
frontend-sba/composables/useGameActions.ts Game action wrappers using the socket
backend/app/websocket/handlers.py Backend Socket.io event handlers
backend/app/websocket/auth.py Cookie-based authentication for WebSocket

SSR-Safe Patterns for WebSocket in Nuxt

DO:

// Lazy initialization only on client
function getClientState(): ClientState {
  if (import.meta.client && !clientState) {
    clientState = { /* init */ }
  }
  return clientState || { /* fallback */ }
}

// Guard all socket operations
function connect() {
  if (!import.meta.client) return
  // ...
}

// Reset state on hydration
if (import.meta.client) {
  isConnected.value = false
}

// Use ReturnType for timer types (SSR compatible)
let timeout: ReturnType<typeof setTimeout> | null = null

DON'T:

// Module-level initialization (runs on server!)
let socket = io(url)  // BAD

// NodeJS types (not available in browser)
let timeout: NodeJS.Timeout  // BAD

// Immediate watchers without guards
watch(() => auth, () => connect(), { immediate: true })  // BAD

Cloudflare Considerations

  • Cloudflare Pro/Business/Enterprise can disable HTTP/2 per-domain
  • Free plan cannot disable HTTP/2, but Cloudflare proxy handles WebSocket upgrade correctly
  • When testing without Cloudflare proxy (orange cloud off), ensure NPM handles HTTP/2 → HTTP/1.1 downgrade
  • WebSocket connections through Cloudflare have a 100-second idle timeout

Debugging Tips

  1. Check if socket instance exists: Add console.log('[WebSocket] Socket exists:', !!state.socketInstance)

  2. Verify SSR vs Client: Add console.log('[WebSocket] Running on:', import.meta.client ? 'client' : 'server')

  3. Monitor hydration: Log in module scope if (import.meta.client) { console.log('Module loaded on client') }

  4. Check network tab: Filter by "WS" to see WebSocket connections, look for 101 status

  5. Backend logs: Look for "Socket.io connection" messages to confirm backend receives the connection

  • CLAUDE: Improve service scripts and fix WebSocket plugin conflict - Removed conflicting plugin
  • CLAUDE: Fix SSR/hydration issues in WebSocket composable - Main SSR fix

References