# Plan 002: WebSocket Handler Locking **Priority**: CRITICAL **Effort**: 2-3 hours **Status**: NOT STARTED **Risk Level**: HIGH - Data corruption --- ## Problem Statement WebSocket handlers mutate `GameState` without acquiring per-game locks, allowing concurrent handlers to corrupt game state. **Example Race Condition**: ``` Time Client A (roll_dice) Client B (roll_dice) ---- -------------------- -------------------- T1 Read pending_roll = None T2 Read pending_roll = None T3 Set pending_roll = RollA T4 Set pending_roll = RollB (overwrites!) T5 Process with RollA T6 Process with RollB (uses wrong roll!) ``` ## Impact - **Data Corruption**: Invalid game states - **Invalid Outcomes**: Wrong play resolutions - **User Trust**: Inconsistent game behavior ## Files to Modify | File | Changes | |------|---------| | `backend/app/websocket/handlers.py` | Add lock acquisition to handlers | | `backend/app/core/state_manager.py` | Expose lock acquisition method | ## Current Lock Implementation The `StateManager` already has per-game locks: ```python # backend/app/core/state_manager.py class StateManager: def __init__(self): self._game_locks: dict[UUID, asyncio.Lock] = {} def _get_game_lock(self, game_id: UUID) -> asyncio.Lock: if game_id not in self._game_locks: self._game_locks[game_id] = asyncio.Lock() return self._game_locks[game_id] ``` **Problem**: Locks exist but handlers don't use them! ## Implementation Steps ### Step 1: Expose Lock Context Manager (15 min) Update `backend/app/core/state_manager.py`: ```python from contextlib import asynccontextmanager class StateManager: # ... existing code ... @asynccontextmanager async def game_lock(self, game_id: UUID): """ Acquire exclusive lock for game operations. Use this for any handler that modifies game state. Usage: async with state_manager.game_lock(game_id): # Perform state modifications """ lock = self._get_game_lock(game_id) async with lock: yield ``` ### Step 2: Identify Handlers Requiring Locks (15 min) Handlers that modify state (MUST lock): - [x] `submit_defensive_decision` - modifies pending decisions - [x] `submit_offensive_decision` - modifies pending decisions - [x] `roll_dice` - modifies pending_manual_roll - [x] `submit_manual_outcome` - modifies game state - [x] `request_pinch_hitter` - modifies lineup - [x] `request_defensive_replacement` - modifies lineup - [x] `request_pitching_change` - modifies lineup Handlers that only read (no lock needed): - [ ] `join_game` - read only - [ ] `leave_game` - read only - [ ] `get_lineup` - read only - [ ] `get_box_score` - read only ### Step 3: Update Decision Handlers (30 min) ```python @sio.event async def submit_defensive_decision(sid, data): game_id = UUID(data.get("game_id")) async with state_manager.game_lock(game_id): # All state modifications inside lock game_state = state_manager.get_game_state(game_id) if not game_state: await sio.emit("error", {"message": "Game not found"}, to=sid) return # Process decision decision = DefensiveDecision(**data.get("decision")) game_state.pending_defensive_decision = decision # Check if both decisions received if game_state.pending_offensive_decision: # Resolve play within lock result = await game_engine.resolve_play(game_state) await manager.broadcast_to_game(game_id, "play_resolved", result) ``` ### Step 4: Update Roll/Outcome Handlers (30 min) ```python @sio.event async def roll_dice(sid, data): game_id = UUID(data.get("game_id")) roll_type = data.get("roll_type") async with state_manager.game_lock(game_id): game_state = state_manager.get_game_state(game_id) # Validate no pending roll if game_state.pending_manual_roll: await sio.emit("error", {"message": "Roll already pending"}, to=sid) return # Perform roll and set pending roll = dice.roll(roll_type, game_state.league_id) game_state.pending_manual_roll = roll await manager.broadcast_to_game(game_id, "dice_rolled", roll.to_dict()) ``` ### Step 5: Update Substitution Handlers (30 min) ```python @sio.event async def request_pinch_hitter(sid, data): game_id = UUID(data.get("game_id")) async with state_manager.game_lock(game_id): game_state = state_manager.get_game_state(game_id) # Validate and process substitution result = await substitution_manager.process_pinch_hitter( game_state, entering_player_id=data.get("entering_player_id"), exiting_player_id=data.get("exiting_player_id") ) if result.success: await manager.broadcast_to_game(game_id, "player_substituted", result.to_dict()) else: await sio.emit("error", {"message": result.error}, to=sid) ``` ### Step 6: Add Lock Timeout (15 min) Prevent deadlocks with timeout: ```python @asynccontextmanager async def game_lock(self, game_id: UUID, timeout: float = 30.0): """Acquire game lock with timeout.""" lock = self._get_game_lock(game_id) try: await asyncio.wait_for(lock.acquire(), timeout=timeout) yield except asyncio.TimeoutError: logger.error(f"Lock timeout for game {game_id}") raise RuntimeError(f"Could not acquire lock for game {game_id}") finally: if lock.locked(): lock.release() ``` ### Step 7: Write Concurrency Tests (30 min) Create `backend/tests/unit/websocket/test_handler_locking.py`: ```python import pytest import asyncio from unittest.mock import AsyncMock, patch class TestHandlerLocking: """Tests for WebSocket handler concurrency.""" @pytest.mark.asyncio async def test_concurrent_rolls_serialized(self): """Two concurrent roll_dice calls should be serialized.""" results = [] async def mock_roll_dice(sid, data): async with state_manager.game_lock(game_id): results.append(f"start_{sid}") await asyncio.sleep(0.1) # Simulate work results.append(f"end_{sid}") # Launch concurrent handlers await asyncio.gather( mock_roll_dice("sid1", {}), mock_roll_dice("sid2", {}) ) # Should be serialized: start_X, end_X, start_Y, end_Y assert results[0].startswith("start") assert results[1].startswith("end") assert results[0].split("_")[1] == results[1].split("_")[1] @pytest.mark.asyncio async def test_lock_timeout_raises_error(self): """Lock acquisition should timeout if held too long.""" @pytest.mark.asyncio async def test_different_games_not_blocked(self): """Locks for different games should not block each other.""" ``` ## Verification Checklist - [ ] All state-modifying handlers use `game_lock` - [ ] Lock timeout prevents deadlocks - [ ] Different games can proceed concurrently - [ ] Same game operations are serialized - [ ] Concurrency tests pass - [ ] Manual test: rapid-fire decisions don't corrupt state ## Performance Considerations - Lock contention should be minimal (turn-based game) - 30-second timeout is generous for any operation - Per-game locks allow parallel games to proceed ## Rollback Plan If issues arise: 1. Remove lock calls from handlers 2. Add rate limiting as temporary mitigation 3. Investigate specific race condition scenarios ## Dependencies - None (can be implemented independently) - Recommended: Implement after Plan 001 (Authorization) ## Notes - Consider adding lock metrics for monitoring - May want to add retry logic for timeout scenarios - Future: Consider optimistic locking for read-heavy operations