Frontend UX improvements: - Single-click Discord OAuth from home page (no intermediate /auth page) - Auto-redirect authenticated users from home to /games - Fixed Nuxt layout system - app.vue now wraps NuxtPage with NuxtLayout - Games page now has proper card container with shadow/border styling - Layout header includes working logout with API cookie clearing Games list enhancements: - Display team names (lname) instead of just team IDs - Show current score for each team - Show inning indicator (Top/Bot X) for active games - Responsive header with wrapped buttons on mobile Backend improvements: - Added team caching to SbaApiClient (1-hour TTL) - Enhanced GameListItem with team names, scores, inning data - Games endpoint now enriches response with SBA API team data Docker optimizations: - Optimized Dockerfile using --chown flag on COPY (faster than chown -R) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
7.9 KiB
7.9 KiB
Plan 002: WebSocket Handler Locking
Priority: CRITICAL Effort: 2-3 hours Status: NOT STARTED Risk Level: HIGH - Data corruption
Problem Statement
WebSocket handlers mutate GameState without acquiring per-game locks, allowing concurrent handlers to corrupt game state.
Example Race Condition:
Time Client A (roll_dice) Client B (roll_dice)
---- -------------------- --------------------
T1 Read pending_roll = None
T2 Read pending_roll = None
T3 Set pending_roll = RollA
T4 Set pending_roll = RollB (overwrites!)
T5 Process with RollA
T6 Process with RollB (uses wrong roll!)
Impact
- Data Corruption: Invalid game states
- Invalid Outcomes: Wrong play resolutions
- User Trust: Inconsistent game behavior
Files to Modify
| File | Changes |
|---|---|
backend/app/websocket/handlers.py |
Add lock acquisition to handlers |
backend/app/core/state_manager.py |
Expose lock acquisition method |
Current Lock Implementation
The StateManager already has per-game locks:
# backend/app/core/state_manager.py
class StateManager:
def __init__(self):
self._game_locks: dict[UUID, asyncio.Lock] = {}
def _get_game_lock(self, game_id: UUID) -> asyncio.Lock:
if game_id not in self._game_locks:
self._game_locks[game_id] = asyncio.Lock()
return self._game_locks[game_id]
Problem: Locks exist but handlers don't use them!
Implementation Steps
Step 1: Expose Lock Context Manager (15 min)
Update backend/app/core/state_manager.py:
from contextlib import asynccontextmanager
class StateManager:
# ... existing code ...
@asynccontextmanager
async def game_lock(self, game_id: UUID):
"""
Acquire exclusive lock for game operations.
Use this for any handler that modifies game state.
Usage:
async with state_manager.game_lock(game_id):
# Perform state modifications
"""
lock = self._get_game_lock(game_id)
async with lock:
yield
Step 2: Identify Handlers Requiring Locks (15 min)
Handlers that modify state (MUST lock):
submit_defensive_decision- modifies pending decisionssubmit_offensive_decision- modifies pending decisionsroll_dice- modifies pending_manual_rollsubmit_manual_outcome- modifies game staterequest_pinch_hitter- modifies lineuprequest_defensive_replacement- modifies lineuprequest_pitching_change- modifies lineup
Handlers that only read (no lock needed):
join_game- read onlyleave_game- read onlyget_lineup- read onlyget_box_score- read only
Step 3: Update Decision Handlers (30 min)
@sio.event
async def submit_defensive_decision(sid, data):
game_id = UUID(data.get("game_id"))
async with state_manager.game_lock(game_id):
# All state modifications inside lock
game_state = state_manager.get_game_state(game_id)
if not game_state:
await sio.emit("error", {"message": "Game not found"}, to=sid)
return
# Process decision
decision = DefensiveDecision(**data.get("decision"))
game_state.pending_defensive_decision = decision
# Check if both decisions received
if game_state.pending_offensive_decision:
# Resolve play within lock
result = await game_engine.resolve_play(game_state)
await manager.broadcast_to_game(game_id, "play_resolved", result)
Step 4: Update Roll/Outcome Handlers (30 min)
@sio.event
async def roll_dice(sid, data):
game_id = UUID(data.get("game_id"))
roll_type = data.get("roll_type")
async with state_manager.game_lock(game_id):
game_state = state_manager.get_game_state(game_id)
# Validate no pending roll
if game_state.pending_manual_roll:
await sio.emit("error", {"message": "Roll already pending"}, to=sid)
return
# Perform roll and set pending
roll = dice.roll(roll_type, game_state.league_id)
game_state.pending_manual_roll = roll
await manager.broadcast_to_game(game_id, "dice_rolled", roll.to_dict())
Step 5: Update Substitution Handlers (30 min)
@sio.event
async def request_pinch_hitter(sid, data):
game_id = UUID(data.get("game_id"))
async with state_manager.game_lock(game_id):
game_state = state_manager.get_game_state(game_id)
# Validate and process substitution
result = await substitution_manager.process_pinch_hitter(
game_state,
entering_player_id=data.get("entering_player_id"),
exiting_player_id=data.get("exiting_player_id")
)
if result.success:
await manager.broadcast_to_game(game_id, "player_substituted", result.to_dict())
else:
await sio.emit("error", {"message": result.error}, to=sid)
Step 6: Add Lock Timeout (15 min)
Prevent deadlocks with timeout:
@asynccontextmanager
async def game_lock(self, game_id: UUID, timeout: float = 30.0):
"""Acquire game lock with timeout."""
lock = self._get_game_lock(game_id)
try:
await asyncio.wait_for(lock.acquire(), timeout=timeout)
yield
except asyncio.TimeoutError:
logger.error(f"Lock timeout for game {game_id}")
raise RuntimeError(f"Could not acquire lock for game {game_id}")
finally:
if lock.locked():
lock.release()
Step 7: Write Concurrency Tests (30 min)
Create backend/tests/unit/websocket/test_handler_locking.py:
import pytest
import asyncio
from unittest.mock import AsyncMock, patch
class TestHandlerLocking:
"""Tests for WebSocket handler concurrency."""
@pytest.mark.asyncio
async def test_concurrent_rolls_serialized(self):
"""Two concurrent roll_dice calls should be serialized."""
results = []
async def mock_roll_dice(sid, data):
async with state_manager.game_lock(game_id):
results.append(f"start_{sid}")
await asyncio.sleep(0.1) # Simulate work
results.append(f"end_{sid}")
# Launch concurrent handlers
await asyncio.gather(
mock_roll_dice("sid1", {}),
mock_roll_dice("sid2", {})
)
# Should be serialized: start_X, end_X, start_Y, end_Y
assert results[0].startswith("start")
assert results[1].startswith("end")
assert results[0].split("_")[1] == results[1].split("_")[1]
@pytest.mark.asyncio
async def test_lock_timeout_raises_error(self):
"""Lock acquisition should timeout if held too long."""
@pytest.mark.asyncio
async def test_different_games_not_blocked(self):
"""Locks for different games should not block each other."""
Verification Checklist
- All state-modifying handlers use
game_lock - Lock timeout prevents deadlocks
- Different games can proceed concurrently
- Same game operations are serialized
- Concurrency tests pass
- Manual test: rapid-fire decisions don't corrupt state
Performance Considerations
- Lock contention should be minimal (turn-based game)
- 30-second timeout is generous for any operation
- Per-game locks allow parallel games to proceed
Rollback Plan
If issues arise:
- Remove lock calls from handlers
- Add rate limiting as temporary mitigation
- Investigate specific race condition scenarios
Dependencies
- None (can be implemented independently)
- Recommended: Implement after Plan 001 (Authorization)
Notes
- Consider adding lock metrics for monitoring
- May want to add retry logic for timeout scenarios
- Future: Consider optimistic locking for read-heavy operations