strat-gameplay-webapp/.claude/WEBSOCKET_TROUBLESHOOTING.md
Cal Corum db667965e6 CLAUDE: Add WebSocket troubleshooting documentation
Comprehensive guide documenting the investigation and resolution of
intermittent WebSocket connection failures:

- NPM configuration issues (HTTP/2, access lists, custom locations)
- Conflicting socket.io plugin (JWT vs cookie auth)
- SSR/hydration state corruption (primary root cause)
- Diagnostic commands and debugging tips
- SSR-safe patterns for WebSocket in Nuxt
- Cloudflare considerations

This document serves as future reference for similar WebSocket issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-29 15:32:30 -06:00

228 lines
7.5 KiB
Markdown

# WebSocket Connection Troubleshooting Guide
**Last Updated**: 2025-01-29
**Issue Resolved**: Intermittent WebSocket connection failures through reverse proxy
## Problem Statement
WebSocket connections would intermittently fail when accessing the application through the production URL (`gameplay-demo.manticorum.com`). The connection would show "Socket exists: no" and the browser wouldn't even attempt to make a WebSocket connection. Mysteriously, adding debug code or making minor changes would cause the connection to "randomly start working."
## Architecture Overview
```
Browser → Cloudflare → Nginx Proxy Manager → Backend (FastAPI + Socket.io)
Port 8000
```
## Root Causes Identified
### 1. Nginx Proxy Manager Configuration Issues
**Symptoms**:
- Direct connection to backend (localhost:8000) returned HTTP 101 (success)
- Connection through NPM returned HTTP 400 or 500
**Causes Found**:
- HTTP/2 stripping WebSocket upgrade headers (Cloudflare enables HTTP/2 by default)
- Access lists blocking non-Cloudflare IPs when Cloudflare proxy was disabled for testing
- Custom location blocks not inheriting WebSocket headers from server-level config
**Solution**:
- Disable HTTP/2 in NPM for WebSocket hosts (or use Cloudflare proxy which handles this)
- Remove restrictive access lists during development
- Use NPM's built-in "WebSocket Support" toggle rather than manual header configuration
- Route `/socket.io` to port 8000 via NPM GUI custom location
### 2. Conflicting Socket.io Plugin
**Symptoms**:
- "Socket exists: no" in UI debug output
- No `/socket.io` network requests appearing in logs
- Browser not even attempting WebSocket connection
**Cause**:
File `frontend-sba/plugins/socket.client.ts` was using JWT token authentication, conflicting with the cookie-based authentication in `useWebSocket.ts`. Both were trying to manage the socket connection.
**Solution**:
Deleted/disabled `plugins/socket.client.ts`. The `useWebSocket.ts` composable handles all WebSocket management with cookie-based auth.
### 3. SSR/Hydration State Corruption (Primary Root Cause)
**Symptoms**:
- Connection would "randomly work" after adding debug code or making changes
- No functional difference in code changes, but behavior changed
- Intermittent failures that seemed timing-dependent
**Cause**:
Module-level singleton state in `useWebSocket.ts` was being initialized during SSR (server-side rendering), then persisting in a corrupted state during client hydration. The state variables:
```typescript
// BEFORE (problematic)
let socketInstance: Socket | null = null
let reconnectionAttempts = 0
let reconnectionTimeout: NodeJS.Timeout | null = null
```
These were initialized on the server during SSR, potentially set to intermediate values, then the same state object was "reused" on the client during hydration instead of being freshly initialized.
**Solution**:
Refactored to use lazy client-only initialization:
```typescript
// AFTER (fixed)
interface ClientState {
socketInstance: Socket | null
reconnectionAttempts: number
reconnectionTimeout: ReturnType<typeof setTimeout> | null
// ... other state
initialized: boolean
}
let clientState: ClientState | null = null
function getClientState(): ClientState {
if (import.meta.client && !clientState) {
clientState = { /* fresh initialization */ }
}
return clientState || { /* empty fallback for SSR */ }
}
// Reset reactive state on client hydration
if (import.meta.client) {
isConnected.value = false
isConnecting.value = false
// ...
}
```
Added `import.meta.client` guards to all functions that interact with the socket.
## Diagnostic Commands
### Test Direct Backend Connection
```bash
curl -v -H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
-H "Sec-WebSocket-Version: 13" \
"http://localhost:8000/socket.io/?EIO=4&transport=websocket"
# Should return HTTP 101 Switching Protocols
```
### Test Through Proxy
```bash
curl -v -H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
-H "Sec-WebSocket-Version: 13" \
"https://gameplay-demo.manticorum.com/socket.io/?EIO=4&transport=websocket"
# Should also return HTTP 101
```
### Check NPM Configuration
```bash
ssh homelab # or your NPM server
cat /data/nginx/proxy_host/*.conf | grep -A 20 "gameplay-demo"
```
### Monitor Backend Logs
```bash
tail -f /tmp/backend_live.log | grep -i socket
```
## NPM Configuration for WebSocket
Working configuration for `gameplay-demo.manticorum.com`:
1. **Proxy Host Settings**:
- Forward Hostname: `10.10.0.16` (or your backend host)
- Forward Port: `8000`
- WebSocket Support: **ENABLED** (toggle in GUI)
- HTTP/2 Support: **DISABLED** (for WebSocket compatibility)
2. **Custom Location** (via GUI, not Advanced config):
- Location: `/socket.io`
- Forward Host: `10.10.0.16`
- Forward Port: `8000`
3. **No Access Lists** blocking during development
## Key Files
| File | Purpose |
|------|---------|
| `frontend-sba/composables/useWebSocket.ts` | Main WebSocket management with SSR-safe patterns |
| `frontend-sba/composables/useGameActions.ts` | Game action wrappers using the socket |
| `backend/app/websocket/handlers.py` | Backend Socket.io event handlers |
| `backend/app/websocket/auth.py` | Cookie-based authentication for WebSocket |
## SSR-Safe Patterns for WebSocket in Nuxt
### DO:
```typescript
// Lazy initialization only on client
function getClientState(): ClientState {
if (import.meta.client && !clientState) {
clientState = { /* init */ }
}
return clientState || { /* fallback */ }
}
// Guard all socket operations
function connect() {
if (!import.meta.client) return
// ...
}
// Reset state on hydration
if (import.meta.client) {
isConnected.value = false
}
// Use ReturnType for timer types (SSR compatible)
let timeout: ReturnType<typeof setTimeout> | null = null
```
### DON'T:
```typescript
// Module-level initialization (runs on server!)
let socket = io(url) // BAD
// NodeJS types (not available in browser)
let timeout: NodeJS.Timeout // BAD
// Immediate watchers without guards
watch(() => auth, () => connect(), { immediate: true }) // BAD
```
## Cloudflare Considerations
- Cloudflare Pro/Business/Enterprise can disable HTTP/2 per-domain
- Free plan cannot disable HTTP/2, but Cloudflare proxy handles WebSocket upgrade correctly
- When testing without Cloudflare proxy (orange cloud off), ensure NPM handles HTTP/2 → HTTP/1.1 downgrade
- WebSocket connections through Cloudflare have a 100-second idle timeout
## Debugging Tips
1. **Check if socket instance exists**: Add `console.log('[WebSocket] Socket exists:', !!state.socketInstance)`
2. **Verify SSR vs Client**: Add `console.log('[WebSocket] Running on:', import.meta.client ? 'client' : 'server')`
3. **Monitor hydration**: Log in module scope `if (import.meta.client) { console.log('Module loaded on client') }`
4. **Check network tab**: Filter by "WS" to see WebSocket connections, look for 101 status
5. **Backend logs**: Look for "Socket.io connection" messages to confirm backend receives the connection
## Related Commits
- `CLAUDE: Improve service scripts and fix WebSocket plugin conflict` - Removed conflicting plugin
- `CLAUDE: Fix SSR/hydration issues in WebSocket composable` - Main SSR fix
## References
- [Nuxt 3 SSR Documentation](https://nuxt.com/docs/guide/concepts/rendering)
- [Socket.io Client Documentation](https://socket.io/docs/v4/client-initialization/)
- [Nginx Proxy Manager WebSocket Support](https://nginxproxymanager.com/advanced-config/)