strat-gameplay-webapp/.claude/plans/002-websocket-locking.md
Cal Corum e0c12467b0 CLAUDE: Improve UX with single-click OAuth, enhanced games list, and layout fix
Frontend UX improvements:
- Single-click Discord OAuth from home page (no intermediate /auth page)
- Auto-redirect authenticated users from home to /games
- Fixed Nuxt layout system - app.vue now wraps NuxtPage with NuxtLayout
- Games page now has proper card container with shadow/border styling
- Layout header includes working logout with API cookie clearing

Games list enhancements:
- Display team names (lname) instead of just team IDs
- Show current score for each team
- Show inning indicator (Top/Bot X) for active games
- Responsive header with wrapped buttons on mobile

Backend improvements:
- Added team caching to SbaApiClient (1-hour TTL)
- Enhanced GameListItem with team names, scores, inning data
- Games endpoint now enriches response with SBA API team data

Docker optimizations:
- Optimized Dockerfile using --chown flag on COPY (faster than chown -R)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 16:14:00 -06:00

7.9 KiB

Plan 002: WebSocket Handler Locking

Priority: CRITICAL Effort: 2-3 hours Status: NOT STARTED Risk Level: HIGH - Data corruption


Problem Statement

WebSocket handlers mutate GameState without acquiring per-game locks, allowing concurrent handlers to corrupt game state.

Example Race Condition:

Time    Client A (roll_dice)              Client B (roll_dice)
----    --------------------              --------------------
T1      Read pending_roll = None
T2                                        Read pending_roll = None
T3      Set pending_roll = RollA
T4                                        Set pending_roll = RollB  (overwrites!)
T5      Process with RollA
T6                                        Process with RollB (uses wrong roll!)

Impact

  • Data Corruption: Invalid game states
  • Invalid Outcomes: Wrong play resolutions
  • User Trust: Inconsistent game behavior

Files to Modify

File Changes
backend/app/websocket/handlers.py Add lock acquisition to handlers
backend/app/core/state_manager.py Expose lock acquisition method

Current Lock Implementation

The StateManager already has per-game locks:

# backend/app/core/state_manager.py
class StateManager:
    def __init__(self):
        self._game_locks: dict[UUID, asyncio.Lock] = {}

    def _get_game_lock(self, game_id: UUID) -> asyncio.Lock:
        if game_id not in self._game_locks:
            self._game_locks[game_id] = asyncio.Lock()
        return self._game_locks[game_id]

Problem: Locks exist but handlers don't use them!

Implementation Steps

Step 1: Expose Lock Context Manager (15 min)

Update backend/app/core/state_manager.py:

from contextlib import asynccontextmanager

class StateManager:
    # ... existing code ...

    @asynccontextmanager
    async def game_lock(self, game_id: UUID):
        """
        Acquire exclusive lock for game operations.
        Use this for any handler that modifies game state.

        Usage:
            async with state_manager.game_lock(game_id):
                # Perform state modifications
        """
        lock = self._get_game_lock(game_id)
        async with lock:
            yield

Step 2: Identify Handlers Requiring Locks (15 min)

Handlers that modify state (MUST lock):

  • submit_defensive_decision - modifies pending decisions
  • submit_offensive_decision - modifies pending decisions
  • roll_dice - modifies pending_manual_roll
  • submit_manual_outcome - modifies game state
  • request_pinch_hitter - modifies lineup
  • request_defensive_replacement - modifies lineup
  • request_pitching_change - modifies lineup

Handlers that only read (no lock needed):

  • join_game - read only
  • leave_game - read only
  • get_lineup - read only
  • get_box_score - read only

Step 3: Update Decision Handlers (30 min)

@sio.event
async def submit_defensive_decision(sid, data):
    game_id = UUID(data.get("game_id"))

    async with state_manager.game_lock(game_id):
        # All state modifications inside lock
        game_state = state_manager.get_game_state(game_id)
        if not game_state:
            await sio.emit("error", {"message": "Game not found"}, to=sid)
            return

        # Process decision
        decision = DefensiveDecision(**data.get("decision"))
        game_state.pending_defensive_decision = decision

        # Check if both decisions received
        if game_state.pending_offensive_decision:
            # Resolve play within lock
            result = await game_engine.resolve_play(game_state)
            await manager.broadcast_to_game(game_id, "play_resolved", result)

Step 4: Update Roll/Outcome Handlers (30 min)

@sio.event
async def roll_dice(sid, data):
    game_id = UUID(data.get("game_id"))
    roll_type = data.get("roll_type")

    async with state_manager.game_lock(game_id):
        game_state = state_manager.get_game_state(game_id)

        # Validate no pending roll
        if game_state.pending_manual_roll:
            await sio.emit("error", {"message": "Roll already pending"}, to=sid)
            return

        # Perform roll and set pending
        roll = dice.roll(roll_type, game_state.league_id)
        game_state.pending_manual_roll = roll

        await manager.broadcast_to_game(game_id, "dice_rolled", roll.to_dict())

Step 5: Update Substitution Handlers (30 min)

@sio.event
async def request_pinch_hitter(sid, data):
    game_id = UUID(data.get("game_id"))

    async with state_manager.game_lock(game_id):
        game_state = state_manager.get_game_state(game_id)

        # Validate and process substitution
        result = await substitution_manager.process_pinch_hitter(
            game_state,
            entering_player_id=data.get("entering_player_id"),
            exiting_player_id=data.get("exiting_player_id")
        )

        if result.success:
            await manager.broadcast_to_game(game_id, "player_substituted", result.to_dict())
        else:
            await sio.emit("error", {"message": result.error}, to=sid)

Step 6: Add Lock Timeout (15 min)

Prevent deadlocks with timeout:

@asynccontextmanager
async def game_lock(self, game_id: UUID, timeout: float = 30.0):
    """Acquire game lock with timeout."""
    lock = self._get_game_lock(game_id)
    try:
        await asyncio.wait_for(lock.acquire(), timeout=timeout)
        yield
    except asyncio.TimeoutError:
        logger.error(f"Lock timeout for game {game_id}")
        raise RuntimeError(f"Could not acquire lock for game {game_id}")
    finally:
        if lock.locked():
            lock.release()

Step 7: Write Concurrency Tests (30 min)

Create backend/tests/unit/websocket/test_handler_locking.py:

import pytest
import asyncio
from unittest.mock import AsyncMock, patch

class TestHandlerLocking:
    """Tests for WebSocket handler concurrency."""

    @pytest.mark.asyncio
    async def test_concurrent_rolls_serialized(self):
        """Two concurrent roll_dice calls should be serialized."""
        results = []

        async def mock_roll_dice(sid, data):
            async with state_manager.game_lock(game_id):
                results.append(f"start_{sid}")
                await asyncio.sleep(0.1)  # Simulate work
                results.append(f"end_{sid}")

        # Launch concurrent handlers
        await asyncio.gather(
            mock_roll_dice("sid1", {}),
            mock_roll_dice("sid2", {})
        )

        # Should be serialized: start_X, end_X, start_Y, end_Y
        assert results[0].startswith("start")
        assert results[1].startswith("end")
        assert results[0].split("_")[1] == results[1].split("_")[1]

    @pytest.mark.asyncio
    async def test_lock_timeout_raises_error(self):
        """Lock acquisition should timeout if held too long."""

    @pytest.mark.asyncio
    async def test_different_games_not_blocked(self):
        """Locks for different games should not block each other."""

Verification Checklist

  • All state-modifying handlers use game_lock
  • Lock timeout prevents deadlocks
  • Different games can proceed concurrently
  • Same game operations are serialized
  • Concurrency tests pass
  • Manual test: rapid-fire decisions don't corrupt state

Performance Considerations

  • Lock contention should be minimal (turn-based game)
  • 30-second timeout is generous for any operation
  • Per-game locks allow parallel games to proceed

Rollback Plan

If issues arise:

  1. Remove lock calls from handlers
  2. Add rate limiting as temporary mitigation
  3. Investigate specific race condition scenarios

Dependencies

  • None (can be implemented independently)
  • Recommended: Implement after Plan 001 (Authorization)

Notes

  • Consider adding lock metrics for monitoring
  • May want to add retry logic for timeout scenarios
  • Future: Consider optimistic locking for read-heavy operations