CLAUDE: Add WebSocket troubleshooting documentation
Comprehensive guide documenting the investigation and resolution of intermittent WebSocket connection failures: - NPM configuration issues (HTTP/2, access lists, custom locations) - Conflicting socket.io plugin (JWT vs cookie auth) - SSR/hydration state corruption (primary root cause) - Diagnostic commands and debugging tips - SSR-safe patterns for WebSocket in Nuxt - Cloudflare considerations This document serves as future reference for similar WebSocket issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
751dcaf972
commit
db667965e6
227
.claude/WEBSOCKET_TROUBLESHOOTING.md
Normal file
227
.claude/WEBSOCKET_TROUBLESHOOTING.md
Normal file
@ -0,0 +1,227 @@
|
||||
# WebSocket Connection Troubleshooting Guide
|
||||
|
||||
**Last Updated**: 2025-01-29
|
||||
**Issue Resolved**: Intermittent WebSocket connection failures through reverse proxy
|
||||
|
||||
## Problem Statement
|
||||
|
||||
WebSocket connections would intermittently fail when accessing the application through the production URL (`gameplay-demo.manticorum.com`). The connection would show "Socket exists: no" and the browser wouldn't even attempt to make a WebSocket connection. Mysteriously, adding debug code or making minor changes would cause the connection to "randomly start working."
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
Browser → Cloudflare → Nginx Proxy Manager → Backend (FastAPI + Socket.io)
|
||||
↓
|
||||
Port 8000
|
||||
```
|
||||
|
||||
## Root Causes Identified
|
||||
|
||||
### 1. Nginx Proxy Manager Configuration Issues
|
||||
|
||||
**Symptoms**:
|
||||
- Direct connection to backend (localhost:8000) returned HTTP 101 (success)
|
||||
- Connection through NPM returned HTTP 400 or 500
|
||||
|
||||
**Causes Found**:
|
||||
- HTTP/2 stripping WebSocket upgrade headers (Cloudflare enables HTTP/2 by default)
|
||||
- Access lists blocking non-Cloudflare IPs when Cloudflare proxy was disabled for testing
|
||||
- Custom location blocks not inheriting WebSocket headers from server-level config
|
||||
|
||||
**Solution**:
|
||||
- Disable HTTP/2 in NPM for WebSocket hosts (or use Cloudflare proxy which handles this)
|
||||
- Remove restrictive access lists during development
|
||||
- Use NPM's built-in "WebSocket Support" toggle rather than manual header configuration
|
||||
- Route `/socket.io` to port 8000 via NPM GUI custom location
|
||||
|
||||
### 2. Conflicting Socket.io Plugin
|
||||
|
||||
**Symptoms**:
|
||||
- "Socket exists: no" in UI debug output
|
||||
- No `/socket.io` network requests appearing in logs
|
||||
- Browser not even attempting WebSocket connection
|
||||
|
||||
**Cause**:
|
||||
File `frontend-sba/plugins/socket.client.ts` was using JWT token authentication, conflicting with the cookie-based authentication in `useWebSocket.ts`. Both were trying to manage the socket connection.
|
||||
|
||||
**Solution**:
|
||||
Deleted/disabled `plugins/socket.client.ts`. The `useWebSocket.ts` composable handles all WebSocket management with cookie-based auth.
|
||||
|
||||
### 3. SSR/Hydration State Corruption (Primary Root Cause)
|
||||
|
||||
**Symptoms**:
|
||||
- Connection would "randomly work" after adding debug code or making changes
|
||||
- No functional difference in code changes, but behavior changed
|
||||
- Intermittent failures that seemed timing-dependent
|
||||
|
||||
**Cause**:
|
||||
Module-level singleton state in `useWebSocket.ts` was being initialized during SSR (server-side rendering), then persisting in a corrupted state during client hydration. The state variables:
|
||||
|
||||
```typescript
|
||||
// BEFORE (problematic)
|
||||
let socketInstance: Socket | null = null
|
||||
let reconnectionAttempts = 0
|
||||
let reconnectionTimeout: NodeJS.Timeout | null = null
|
||||
```
|
||||
|
||||
These were initialized on the server during SSR, potentially set to intermediate values, then the same state object was "reused" on the client during hydration instead of being freshly initialized.
|
||||
|
||||
**Solution**:
|
||||
Refactored to use lazy client-only initialization:
|
||||
|
||||
```typescript
|
||||
// AFTER (fixed)
|
||||
interface ClientState {
|
||||
socketInstance: Socket | null
|
||||
reconnectionAttempts: number
|
||||
reconnectionTimeout: ReturnType<typeof setTimeout> | null
|
||||
// ... other state
|
||||
initialized: boolean
|
||||
}
|
||||
|
||||
let clientState: ClientState | null = null
|
||||
|
||||
function getClientState(): ClientState {
|
||||
if (import.meta.client && !clientState) {
|
||||
clientState = { /* fresh initialization */ }
|
||||
}
|
||||
return clientState || { /* empty fallback for SSR */ }
|
||||
}
|
||||
|
||||
// Reset reactive state on client hydration
|
||||
if (import.meta.client) {
|
||||
isConnected.value = false
|
||||
isConnecting.value = false
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
Added `import.meta.client` guards to all functions that interact with the socket.
|
||||
|
||||
## Diagnostic Commands
|
||||
|
||||
### Test Direct Backend Connection
|
||||
```bash
|
||||
curl -v -H "Upgrade: websocket" \
|
||||
-H "Connection: Upgrade" \
|
||||
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
|
||||
-H "Sec-WebSocket-Version: 13" \
|
||||
"http://localhost:8000/socket.io/?EIO=4&transport=websocket"
|
||||
# Should return HTTP 101 Switching Protocols
|
||||
```
|
||||
|
||||
### Test Through Proxy
|
||||
```bash
|
||||
curl -v -H "Upgrade: websocket" \
|
||||
-H "Connection: Upgrade" \
|
||||
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
|
||||
-H "Sec-WebSocket-Version: 13" \
|
||||
"https://gameplay-demo.manticorum.com/socket.io/?EIO=4&transport=websocket"
|
||||
# Should also return HTTP 101
|
||||
```
|
||||
|
||||
### Check NPM Configuration
|
||||
```bash
|
||||
ssh homelab # or your NPM server
|
||||
cat /data/nginx/proxy_host/*.conf | grep -A 20 "gameplay-demo"
|
||||
```
|
||||
|
||||
### Monitor Backend Logs
|
||||
```bash
|
||||
tail -f /tmp/backend_live.log | grep -i socket
|
||||
```
|
||||
|
||||
## NPM Configuration for WebSocket
|
||||
|
||||
Working configuration for `gameplay-demo.manticorum.com`:
|
||||
|
||||
1. **Proxy Host Settings**:
|
||||
- Forward Hostname: `10.10.0.16` (or your backend host)
|
||||
- Forward Port: `8000`
|
||||
- WebSocket Support: **ENABLED** (toggle in GUI)
|
||||
- HTTP/2 Support: **DISABLED** (for WebSocket compatibility)
|
||||
|
||||
2. **Custom Location** (via GUI, not Advanced config):
|
||||
- Location: `/socket.io`
|
||||
- Forward Host: `10.10.0.16`
|
||||
- Forward Port: `8000`
|
||||
|
||||
3. **No Access Lists** blocking during development
|
||||
|
||||
## Key Files
|
||||
|
||||
| File | Purpose |
|
||||
|------|---------|
|
||||
| `frontend-sba/composables/useWebSocket.ts` | Main WebSocket management with SSR-safe patterns |
|
||||
| `frontend-sba/composables/useGameActions.ts` | Game action wrappers using the socket |
|
||||
| `backend/app/websocket/handlers.py` | Backend Socket.io event handlers |
|
||||
| `backend/app/websocket/auth.py` | Cookie-based authentication for WebSocket |
|
||||
|
||||
## SSR-Safe Patterns for WebSocket in Nuxt
|
||||
|
||||
### DO:
|
||||
```typescript
|
||||
// Lazy initialization only on client
|
||||
function getClientState(): ClientState {
|
||||
if (import.meta.client && !clientState) {
|
||||
clientState = { /* init */ }
|
||||
}
|
||||
return clientState || { /* fallback */ }
|
||||
}
|
||||
|
||||
// Guard all socket operations
|
||||
function connect() {
|
||||
if (!import.meta.client) return
|
||||
// ...
|
||||
}
|
||||
|
||||
// Reset state on hydration
|
||||
if (import.meta.client) {
|
||||
isConnected.value = false
|
||||
}
|
||||
|
||||
// Use ReturnType for timer types (SSR compatible)
|
||||
let timeout: ReturnType<typeof setTimeout> | null = null
|
||||
```
|
||||
|
||||
### DON'T:
|
||||
```typescript
|
||||
// Module-level initialization (runs on server!)
|
||||
let socket = io(url) // BAD
|
||||
|
||||
// NodeJS types (not available in browser)
|
||||
let timeout: NodeJS.Timeout // BAD
|
||||
|
||||
// Immediate watchers without guards
|
||||
watch(() => auth, () => connect(), { immediate: true }) // BAD
|
||||
```
|
||||
|
||||
## Cloudflare Considerations
|
||||
|
||||
- Cloudflare Pro/Business/Enterprise can disable HTTP/2 per-domain
|
||||
- Free plan cannot disable HTTP/2, but Cloudflare proxy handles WebSocket upgrade correctly
|
||||
- When testing without Cloudflare proxy (orange cloud off), ensure NPM handles HTTP/2 → HTTP/1.1 downgrade
|
||||
- WebSocket connections through Cloudflare have a 100-second idle timeout
|
||||
|
||||
## Debugging Tips
|
||||
|
||||
1. **Check if socket instance exists**: Add `console.log('[WebSocket] Socket exists:', !!state.socketInstance)`
|
||||
|
||||
2. **Verify SSR vs Client**: Add `console.log('[WebSocket] Running on:', import.meta.client ? 'client' : 'server')`
|
||||
|
||||
3. **Monitor hydration**: Log in module scope `if (import.meta.client) { console.log('Module loaded on client') }`
|
||||
|
||||
4. **Check network tab**: Filter by "WS" to see WebSocket connections, look for 101 status
|
||||
|
||||
5. **Backend logs**: Look for "Socket.io connection" messages to confirm backend receives the connection
|
||||
|
||||
## Related Commits
|
||||
|
||||
- `CLAUDE: Improve service scripts and fix WebSocket plugin conflict` - Removed conflicting plugin
|
||||
- `CLAUDE: Fix SSR/hydration issues in WebSocket composable` - Main SSR fix
|
||||
|
||||
## References
|
||||
|
||||
- [Nuxt 3 SSR Documentation](https://nuxt.com/docs/guide/concepts/rendering)
|
||||
- [Socket.io Client Documentation](https://socket.io/docs/v4/client-initialization/)
|
||||
- [Nginx Proxy Manager WebSocket Support](https://nginxproxymanager.com/advanced-config/)
|
||||
Loading…
Reference in New Issue
Block a user