strat-gameplay-webapp/.claude/plans/003-idle-game-eviction.md
Cal Corum e0c12467b0 CLAUDE: Improve UX with single-click OAuth, enhanced games list, and layout fix
Frontend UX improvements:
- Single-click Discord OAuth from home page (no intermediate /auth page)
- Auto-redirect authenticated users from home to /games
- Fixed Nuxt layout system - app.vue now wraps NuxtPage with NuxtLayout
- Games page now has proper card container with shadow/border styling
- Layout header includes working logout with API cookie clearing

Games list enhancements:
- Display team names (lname) instead of just team IDs
- Show current score for each team
- Show inning indicator (Top/Bot X) for active games
- Responsive header with wrapped buttons on mobile

Backend improvements:
- Added team caching to SbaApiClient (1-hour TTL)
- Enhanced GameListItem with team names, scores, inning data
- Games endpoint now enriches response with SBA API team data

Docker optimizations:
- Optimized Dockerfile using --chown flag on COPY (faster than chown -R)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 16:14:00 -06:00

339 lines
9.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Plan 003: Idle Game Eviction
**Priority**: CRITICAL
**Effort**: 1-2 hours
**Status**: NOT STARTED
**Risk Level**: HIGH - Memory leak / OOM
---
## Problem Statement
The `StateManager` tracks `_last_access[game_id]` for each game but **never uses it for eviction**. Games remain in memory indefinitely, causing unbounded memory growth.
```python
# backend/app/core/state_manager.py
self._last_access[game_id] = pendulum.now("UTC") # Tracked but never used!
```
## Impact
- **Memory**: ~50KB per game × 1000 games = 50MB+ after a month
- **Stability**: OOM crash after days/weeks of operation
- **Performance**: Degraded performance as dictionaries grow
## Files to Modify
| File | Changes |
|------|---------|
| `backend/app/core/state_manager.py` | Add eviction logic |
| `backend/app/main.py` | Start background eviction task |
| `backend/app/config.py` | Add eviction configuration |
## Implementation Steps
### Step 1: Add Configuration (10 min)
Update `backend/app/config.py`:
```python
class Settings(BaseSettings):
# ... existing settings ...
# Game eviction settings
game_idle_timeout_hours: int = 24 # Evict games idle > 24 hours
game_eviction_interval_minutes: int = 60 # Check every hour
game_max_in_memory: int = 500 # Hard limit on in-memory games
```
### Step 2: Implement Eviction Logic (30 min)
Update `backend/app/core/state_manager.py`:
```python
import pendulum
from app.config import settings
class StateManager:
# ... existing code ...
async def evict_idle_games(self) -> list[UUID]:
"""
Remove games that have been idle beyond the timeout threshold.
Returns list of evicted game IDs.
"""
now = pendulum.now("UTC")
timeout_seconds = settings.game_idle_timeout_hours * 3600
evicted = []
# Find idle games
for game_id, last_access in list(self._last_access.items()):
idle_seconds = (now - last_access).total_seconds()
if idle_seconds > timeout_seconds:
evicted.append(game_id)
# Evict them
for game_id in evicted:
await self._evict_game(game_id)
logger.info(f"Evicted idle game {game_id} (idle {idle_seconds/3600:.1f} hours)")
if evicted:
logger.info(f"Evicted {len(evicted)} idle games. Active: {len(self._games)}")
return evicted
async def _evict_game(self, game_id: UUID) -> None:
"""
Remove a single game from memory.
Persists final state to database before removal.
"""
# Persist final state
if game_id in self._games:
game_state = self._games[game_id]
try:
await db_ops.save_game_state(game_id, game_state)
logger.debug(f"Persisted game {game_id} before eviction")
except Exception as e:
logger.error(f"Failed to persist game {game_id}: {e}")
# Remove from all tracking dictionaries
self._games.pop(game_id, None)
self._lineups.pop(game_id, None)
self._last_access.pop(game_id, None)
self._game_locks.pop(game_id, None)
async def enforce_memory_limit(self) -> list[UUID]:
"""
Enforce hard limit on in-memory games.
Evicts oldest games if limit exceeded.
"""
if len(self._games) <= settings.game_max_in_memory:
return []
# Sort by last access time (oldest first)
sorted_games = sorted(
self._last_access.items(),
key=lambda x: x[1]
)
# Evict oldest until under limit
to_evict = len(self._games) - settings.game_max_in_memory
evicted = []
for game_id, _ in sorted_games[:to_evict]:
await self._evict_game(game_id)
evicted.append(game_id)
logger.warning(f"Force-evicted game {game_id} (memory limit)")
return evicted
def get_memory_stats(self) -> dict:
"""Return memory usage statistics."""
return {
"active_games": len(self._games),
"max_games": settings.game_max_in_memory,
"oldest_game_hours": self._get_oldest_game_age_hours(),
"total_lineups_cached": sum(len(l) for l in self._lineups.values())
}
def _get_oldest_game_age_hours(self) -> float:
if not self._last_access:
return 0.0
oldest = min(self._last_access.values())
return (pendulum.now("UTC") - oldest).total_seconds() / 3600
```
### Step 3: Create Background Task (30 min)
Update `backend/app/main.py`:
```python
import asyncio
from contextlib import asynccontextmanager
from app.core.state_manager import state_manager
from app.config import settings
# Background task handle
eviction_task: asyncio.Task | None = None
async def periodic_eviction():
"""Background task to periodically evict idle games."""
interval = settings.game_eviction_interval_minutes * 60
while True:
try:
await asyncio.sleep(interval)
# Run eviction
evicted = await state_manager.evict_idle_games()
# Enforce memory limit
force_evicted = await state_manager.enforce_memory_limit()
# Log stats
stats = state_manager.get_memory_stats()
logger.info(f"Memory stats: {stats}")
except asyncio.CancelledError:
logger.info("Eviction task cancelled")
break
except Exception as e:
logger.error(f"Eviction task error: {e}")
# Continue running despite errors
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application lifespan handler."""
global eviction_task
# Startup
logger.info("Starting background eviction task")
eviction_task = asyncio.create_task(periodic_eviction())
yield
# Shutdown
logger.info("Stopping background eviction task")
if eviction_task:
eviction_task.cancel()
try:
await eviction_task
except asyncio.CancelledError:
pass
app = FastAPI(lifespan=lifespan)
```
### Step 4: Add Health Endpoint (15 min)
Add to `backend/app/api/routes.py`:
```python
@router.get("/health/memory")
async def memory_health():
"""Return memory usage statistics."""
stats = state_manager.get_memory_stats()
# Determine health status
usage_pct = stats["active_games"] / stats["max_games"] * 100
if usage_pct > 90:
status = "critical"
elif usage_pct > 75:
status = "warning"
else:
status = "healthy"
return {
"status": status,
"usage_percent": round(usage_pct, 1),
**stats
}
```
### Step 5: Write Tests (30 min)
Create `backend/tests/unit/core/test_game_eviction.py`:
```python
import pytest
import pendulum
from uuid import uuid4
from unittest.mock import patch, AsyncMock
class TestGameEviction:
"""Tests for idle game eviction."""
@pytest.fixture
def state_manager(self):
from app.core.state_manager import StateManager
return StateManager()
@pytest.mark.asyncio
async def test_evict_idle_games_removes_old_games(self, state_manager):
"""Games idle beyond threshold are evicted."""
game_id = uuid4()
# Create game with old timestamp
state_manager._games[game_id] = MockGameState()
state_manager._last_access[game_id] = pendulum.now("UTC").subtract(hours=25)
with patch.object(state_manager, '_evict_game', new_callable=AsyncMock) as mock:
evicted = await state_manager.evict_idle_games()
assert game_id in evicted
mock.assert_called_once_with(game_id)
@pytest.mark.asyncio
async def test_evict_idle_games_keeps_active_games(self, state_manager):
"""Recently accessed games are not evicted."""
game_id = uuid4()
state_manager._games[game_id] = MockGameState()
state_manager._last_access[game_id] = pendulum.now("UTC").subtract(hours=1)
evicted = await state_manager.evict_idle_games()
assert game_id not in evicted
assert game_id in state_manager._games
@pytest.mark.asyncio
async def test_enforce_memory_limit_evicts_oldest(self, state_manager):
"""Oldest games evicted when memory limit exceeded."""
# Create games at different times
for i in range(10):
game_id = uuid4()
state_manager._games[game_id] = MockGameState()
state_manager._last_access[game_id] = pendulum.now("UTC").subtract(hours=i)
with patch.object(settings, 'game_max_in_memory', 5):
evicted = await state_manager.enforce_memory_limit()
assert len(evicted) == 5
assert len(state_manager._games) == 5
@pytest.mark.asyncio
async def test_evict_game_persists_state(self, state_manager):
"""Game state is persisted before eviction."""
game_id = uuid4()
game_state = MockGameState()
state_manager._games[game_id] = game_state
with patch('app.database.operations.db_ops.save_game_state', new_callable=AsyncMock) as mock:
await state_manager._evict_game(game_id)
mock.assert_called_once_with(game_id, game_state)
```
## Verification Checklist
- [ ] Idle games are evicted after 24 hours
- [ ] Memory limit is enforced
- [ ] Game state is persisted before eviction
- [ ] Background task runs without errors
- [ ] Health endpoint shows accurate stats
- [ ] Tests pass
## Monitoring
After deployment, monitor:
- `/health/memory` endpoint
- Log messages for eviction events
- Memory usage of the process
## Rollback Plan
If issues arise:
1. Increase `game_idle_timeout_hours` to reduce evictions
2. Increase `game_max_in_memory` limit
3. Disable eviction task (comment out in lifespan)
## Dependencies
- None (can be implemented independently)
## Notes
- Consider adding WebSocket notification before eviction
- May want to add "extend session" API for active users
- Future: Add Redis-backed state for horizontal scaling