# Statistics System: Materialized Views Approach **Date**: 2025-11-06 **Approach**: PostgreSQL Materialized Views (following legacy API pattern) --- ## Why Materialized Views? ### Legacy API Pattern The existing major-domo API already does this: - **`/plays/batting`** - Aggregates plays on-the-fly with SQL SUM/COUNT - **`/plays/pitching`** - Aggregates plays on-the-fly with SQL SUM/COUNT - **BattingStat/PitchingStat tables** - Deprecated in favor of play aggregation ### Benefits 1. ✅ **Single source of truth**: Plays table is the only data we write 2. ✅ **No stat tracking code**: PostgreSQL does the aggregation 3. ✅ **Always consistent**: Stats are derived, not stored separately 4. ✅ **Fast queries**: Materialized views are indexed and cached 5. ✅ **Refresh on demand**: Update views when needed (after games, on schedule) 6. ✅ **No sync issues**: Can't have play/stat mismatches ### Drawbacks - ⚠️ **Refresh required**: Views must be refreshed to show latest data - ⚠️ **PostgreSQL specific**: Can't easily port to other databases - ⚠️ **Migration complexity**: Need to manage view schema changes **Verdict**: Benefits far outweigh drawbacks for this use case --- ## Proposed Schema ### Existing Tables (Already Have) ```sql -- We already have these in the web app CREATE TABLE games ( id UUID PRIMARY KEY, league_id VARCHAR(50), home_team_id INT, away_team_id INT, status VARCHAR(20), current_inning INT, current_half VARCHAR(3), home_score INT, away_score INT, -- ... other game fields ); CREATE TABLE lineups ( id SERIAL PRIMARY KEY, game_id UUID REFERENCES games(id), card_id INT, -- Player card ID team_id INT, position VARCHAR(3), batting_order INT, is_active BOOLEAN, is_starter BOOLEAN, -- ... other lineup fields ); CREATE TABLE plays ( id SERIAL PRIMARY KEY, game_id UUID REFERENCES games(id), play_num INT, inning INT, half VARCHAR(3), batter_lineup_id INT REFERENCES lineups(id), pitcher_lineup_id INT REFERENCES lineups(id), play_result VARCHAR(50), -- PlayOutcome enum value -- ... other play fields (runners, outs, scores, etc.) ); ``` **Need to add to Plays table** (currently missing): ```sql ALTER TABLE plays ADD COLUMN IF NOT EXISTS pa INT DEFAULT 0; -- Plate appearance (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS ab INT DEFAULT 0; -- At bat (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS hit INT DEFAULT 0; -- Hit (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS run INT DEFAULT 0; -- Runs scored this play ALTER TABLE plays ADD COLUMN IF NOT EXISTS rbi INT DEFAULT 0; -- RBIs this play ALTER TABLE plays ADD COLUMN IF NOT EXISTS double INT DEFAULT 0; -- Double (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS triple INT DEFAULT 0; -- Triple (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS hr INT DEFAULT 0; -- Home run (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS bb INT DEFAULT 0; -- Walk (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS so INT DEFAULT 0; -- Strikeout (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS hbp INT DEFAULT 0; -- Hit by pitch (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS sac INT DEFAULT 0; -- Sacrifice (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS sb INT DEFAULT 0; -- Stolen base (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS cs INT DEFAULT 0; -- Caught stealing (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS gidp INT DEFAULT 0; -- Ground into DP (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS error INT DEFAULT 0; -- Error this play ALTER TABLE plays ADD COLUMN IF NOT EXISTS wp INT DEFAULT 0; -- Wild pitch (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS pb INT DEFAULT 0; -- Passed ball (0 or 1) ALTER TABLE plays ADD COLUMN IF NOT EXISTS outs_recorded INT DEFAULT 0; -- Outs recorded this play ``` **Why add stat fields to plays?** - Each play records the statistical events that occurred - Makes aggregation queries simple and fast - Matches legacy StratPlay model exactly - Can be calculated from PlayOutcome when play is recorded --- ### New Materialized Views #### 1. Batting Stats by Player-Game ```sql CREATE MATERIALIZED VIEW batting_game_stats AS SELECT l.game_id, l.id AS lineup_id, l.card_id AS player_card_id, l.team_id, -- Aggregate batting stats COUNT(*) FILTER (WHERE p.pa = 1) AS pa, COUNT(*) FILTER (WHERE p.ab = 1) AS ab, SUM(p.run) AS run, SUM(p.hit) AS hit, SUM(p.rbi) AS rbi, SUM(p.double) AS double, SUM(p.triple) AS triple, SUM(p.hr) AS hr, SUM(p.bb) AS bb, SUM(p.so) AS so, SUM(p.hbp) AS hbp, SUM(p.sac) AS sac, SUM(p.sb) AS sb, SUM(p.cs) AS cs, SUM(p.gidp) AS gidp FROM lineups l JOIN plays p ON p.batter_lineup_id = l.id WHERE l.is_active = TRUE OR l.is_starter = TRUE -- Include all who played GROUP BY l.game_id, l.id, l.card_id, l.team_id; -- Indexes for fast lookups CREATE UNIQUE INDEX idx_batting_game_stats_lineup ON batting_game_stats(lineup_id); CREATE INDEX idx_batting_game_stats_game ON batting_game_stats(game_id); CREATE INDEX idx_batting_game_stats_player ON batting_game_stats(player_card_id); ``` **Usage**: `SELECT * FROM batting_game_stats WHERE game_id = ?` --- #### 2. Pitching Stats by Player-Game ```sql CREATE MATERIALIZED VIEW pitching_game_stats AS SELECT l.game_id, l.id AS lineup_id, l.card_id AS player_card_id, l.team_id, -- Aggregate pitching stats COUNT(*) FILTER (WHERE p.pa = 1) AS batters_faced, SUM(p.hit) AS hit_allowed, SUM(p.run) AS run_allowed, -- TODO: Calculate earned runs (need error tracking) SUM(p.run) AS erun, -- Placeholder: all runs as earned for now SUM(p.bb) AS bb, SUM(p.so) AS so, SUM(p.hbp) AS hbp, SUM(p.hr) AS hr_allowed, SUM(p.wp) AS wp, -- Calculate innings pitched from outs -- IP = outs / 3.0 (e.g., 16 outs = 5.1 IP) (SUM(p.outs_recorded)::float / 3.0) AS ip FROM lineups l JOIN plays p ON p.pitcher_lineup_id = l.id WHERE l.position = 'P' AND (l.is_active = TRUE OR l.is_starter = TRUE) GROUP BY l.game_id, l.id, l.card_id, l.team_id; -- Indexes CREATE UNIQUE INDEX idx_pitching_game_stats_lineup ON pitching_game_stats(lineup_id); CREATE INDEX idx_pitching_game_stats_game ON pitching_game_stats(game_id); CREATE INDEX idx_pitching_game_stats_player ON pitching_game_stats(player_card_id); ``` **Usage**: `SELECT * FROM pitching_game_stats WHERE game_id = ?` --- #### 3. Team Game Stats (Summary + Linescore) For box scores, we need team-level aggregates and linescore (runs by inning): ```sql CREATE MATERIALIZED VIEW game_stats AS WITH inning_scores AS ( -- Calculate runs scored per team per inning SELECT game_id, inning, half, SUM(run) AS runs_scored FROM plays GROUP BY game_id, inning, half ), linescores AS ( -- Build linescore JSON arrays SELECT game_id, json_agg(runs_scored ORDER BY inning) FILTER (WHERE half = 'top') AS away_linescore, json_agg(runs_scored ORDER BY inning) FILTER (WHERE half = 'bot') AS home_linescore FROM inning_scores GROUP BY game_id ), batting_totals AS ( -- Team batting totals SELECT game_id, l.team_id, SUM(p.hit) AS hits, SUM(p.error) AS errors FROM plays p JOIN lineups l ON p.batter_lineup_id = l.id GROUP BY game_id, l.team_id ) SELECT g.id AS game_id, g.home_score AS home_runs, g.away_score AS away_runs, -- Linescore COALESCE(ls.home_linescore, '[]'::json) AS home_linescore, COALESCE(ls.away_linescore, '[]'::json) AS away_linescore, -- Home team totals MAX(bth.hits) AS home_hits, MAX(bth.errors) AS home_errors, -- Away team totals MAX(bta.hits) AS away_hits, MAX(bta.errors) AS away_errors FROM games g LEFT JOIN linescores ls ON g.id = ls.game_id LEFT JOIN batting_totals bth ON g.id = bth.game_id AND bth.team_id = g.home_team_id LEFT JOIN batting_totals bta ON g.id = bta.game_id AND bta.team_id = g.away_team_id GROUP BY g.id, ls.home_linescore, ls.away_linescore; -- Index CREATE UNIQUE INDEX idx_game_stats_game ON game_stats(game_id); ``` **Usage**: `SELECT * FROM game_stats WHERE game_id = ?` **Linescore Format**: `[0, 1, 0, 3, 0, 0, 2, 1, 0]` (9 innings) --- ## Implementation Strategy ### Phase 3.5 Changes #### 1. Add Stat Fields to Plays Table (Migration) **File**: `backend/alembic/versions/004_add_play_stat_fields.py` ```python """Add statistical fields to plays table. Revision ID: 004 Revises: 003 """ def upgrade(): # Add batting stat fields op.add_column('plays', sa.Column('pa', sa.Integer(), default=0)) op.add_column('plays', sa.Column('ab', sa.Integer(), default=0)) op.add_column('plays', sa.Column('hit', sa.Integer(), default=0)) op.add_column('plays', sa.Column('run', sa.Integer(), default=0)) op.add_column('plays', sa.Column('rbi', sa.Integer(), default=0)) op.add_column('plays', sa.Column('double', sa.Integer(), default=0)) op.add_column('plays', sa.Column('triple', sa.Integer(), default=0)) op.add_column('plays', sa.Column('hr', sa.Integer(), default=0)) op.add_column('plays', sa.Column('bb', sa.Integer(), default=0)) op.add_column('plays', sa.Column('so', sa.Integer(), default=0)) op.add_column('plays', sa.Column('hbp', sa.Integer(), default=0)) op.add_column('plays', sa.Column('sac', sa.Integer(), default=0)) op.add_column('plays', sa.Column('sb', sa.Integer(), default=0)) op.add_column('plays', sa.Column('cs', sa.Integer(), default=0)) op.add_column('plays', sa.Column('gidp', sa.Integer(), default=0)) op.add_column('plays', sa.Column('error', sa.Integer(), default=0)) op.add_column('plays', sa.Column('wp', sa.Integer(), default=0)) op.add_column('plays', sa.Column('pb', sa.Integer(), default=0)) op.add_column('plays', sa.Column('outs_recorded', sa.Integer(), default=0)) def downgrade(): # Remove all stat columns op.drop_column('plays', 'pa') # ... etc ``` #### 2. Update Play Model **File**: `backend/app/models/db_models.py` ```python class Play(Base): __tablename__ = "plays" # Existing fields id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id")) # ... etc # NEW: Statistical fields (added in migration 004) pa = Column(Integer, default=0) ab = Column(Integer, default=0) hit = Column(Integer, default=0) run = Column(Integer, default=0) rbi = Column(Integer, default=0) double = Column(Integer, default=0) triple = Column(Integer, default=0) hr = Column(Integer, default=0) bb = Column(Integer, default=0) so = Column(Integer, default=0) hbp = Column(Integer, default=0) sac = Column(Integer, default=0) sb = Column(Integer, default=0) cs = Column(Integer, default=0) gidp = Column(Integer, default=0) error = Column(Integer, default=0) wp = Column(Integer, default=0) pb = Column(Integer, default=0) outs_recorded = Column(Integer, default=0) ``` #### 3. Create Stat Calculator **File**: `backend/app/services/play_stat_calculator.py` (NEW) ```python """ Calculate statistical fields from PlayOutcome. Called when recording a play to populate stat fields. """ from app.config.play_outcome import PlayOutcome from app.models.game_models import PlayResult, GameState class PlayStatCalculator: """ Converts play outcome to statistical fields for database. """ @staticmethod def calculate_stats( outcome: PlayOutcome, result: PlayResult, state_before: GameState, state_after: GameState ) -> dict: """ Calculate all stat fields for a play. Returns: dict with stat fields (pa, ab, hit, run, etc.) """ stats = { 'pa': 0, 'ab': 0, 'hit': 0, 'run': 0, 'rbi': 0, 'double': 0, 'triple': 0, 'hr': 0, 'bb': 0, 'so': 0, 'hbp': 0, 'sac': 0, 'sb': 0, 'cs': 0, 'gidp': 0, 'error': 0, 'wp': 0, 'pb': 0, 'outs_recorded': 0 } # Plate appearance (almost always 1, except...) if outcome not in [PlayOutcome.STOLEN_BASE_SUCCESS, PlayOutcome.CAUGHT_STEALING, PlayOutcome.WILD_PITCH, PlayOutcome.PASSED_BALL]: stats['pa'] = 1 # At bat (PA minus walks, HBP, sac) if stats['pa'] == 1 and outcome not in [ PlayOutcome.WALK, PlayOutcome.HBP, PlayOutcome.SACRIFICE ]: stats['ab'] = 1 # Hits if outcome in PlayOutcome.get_hit_outcomes(): stats['hit'] = 1 if outcome in PlayOutcome.get_double_outcomes(): stats['double'] = 1 elif outcome in PlayOutcome.get_triple_outcomes(): stats['triple'] = 1 elif outcome in PlayOutcome.get_homerun_outcomes(): stats['hr'] = 1 # Other outcomes if outcome == PlayOutcome.WALK: stats['bb'] = 1 elif outcome == PlayOutcome.STRIKEOUT: stats['so'] = 1 elif outcome == PlayOutcome.HBP: stats['hbp'] = 1 elif outcome == PlayOutcome.SACRIFICE: stats['sac'] = 1 elif outcome == PlayOutcome.STOLEN_BASE_SUCCESS: stats['sb'] = 1 elif outcome == PlayOutcome.CAUGHT_STEALING: stats['cs'] = 1 elif outcome in [PlayOutcome.GROUNDBALL_GDP]: stats['gidp'] = 1 elif outcome == PlayOutcome.WILD_PITCH: stats['wp'] = 1 elif outcome == PlayOutcome.PASSED_BALL: stats['pb'] = 1 # Calculate runs and RBIs from state change stats['run'] = state_after.home_score - state_before.home_score if state_after.away_score > state_before.away_score: stats['run'] = state_after.away_score - state_before.away_score # RBI logic (runs scored minus errors) if not result.error_occurred: stats['rbi'] = stats['run'] # Outs recorded stats['outs_recorded'] = state_after.outs - state_before.outs return stats ``` #### 4. Update GameEngine to Calculate Stats **File**: `backend/app/core/game_engine.py` ```python from app.services.play_stat_calculator import PlayStatCalculator class GameEngine: async def record_play( self, game_id: UUID, outcome: PlayOutcome, ... ) -> PlayResult: """Record play and calculate stats.""" state_before = self.state_manager.get_state(game_id) # Resolve play (existing logic) result = self._resolve_play(...) state_after = self.state_manager.get_state(game_id) # NEW: Calculate stat fields stats = PlayStatCalculator.calculate_stats( outcome, result, state_before, state_after ) # Save play to database with stats await self.db_ops.create_play( game_id=game_id, play_num=state_after.play_count, inning=state_after.current_inning, half=state_after.current_half, batter_lineup_id=state_before.current_batter.lineup_id, pitcher_lineup_id=state_before.current_pitcher.lineup_id, play_result=outcome.value, # NEW: Add stat fields **stats ) return result ``` #### 5. Create Materialized Views (Migration) **File**: `backend/alembic/versions/005_create_stat_views.py` ```python """Create materialized views for statistics. Revision ID: 005 Revises: 004 """ def upgrade(): # Create batting_game_stats view op.execute(""" CREATE MATERIALIZED VIEW batting_game_stats AS SELECT l.game_id, l.id AS lineup_id, l.card_id AS player_card_id, l.team_id, COUNT(*) FILTER (WHERE p.pa = 1) AS pa, COUNT(*) FILTER (WHERE p.ab = 1) AS ab, SUM(p.run) AS run, SUM(p.hit) AS hit, SUM(p.rbi) AS rbi, SUM(p.double) AS double, SUM(p.triple) AS triple, SUM(p.hr) AS hr, SUM(p.bb) AS bb, SUM(p.so) AS so, SUM(p.hbp) AS hbp, SUM(p.sac) AS sac, SUM(p.sb) AS sb, SUM(p.cs) AS cs, SUM(p.gidp) AS gidp FROM lineups l JOIN plays p ON p.batter_lineup_id = l.id GROUP BY l.game_id, l.id, l.card_id, l.team_id """) op.execute(""" CREATE UNIQUE INDEX idx_batting_game_stats_lineup ON batting_game_stats(lineup_id) """) # Create pitching_game_stats view op.execute(""" CREATE MATERIALIZED VIEW pitching_game_stats AS ... (full SQL from above) """) # Create game_stats view op.execute(""" CREATE MATERIALIZED VIEW game_stats AS ... (full SQL from above) """) def downgrade(): op.execute("DROP MATERIALIZED VIEW IF EXISTS batting_game_stats") op.execute("DROP MATERIALIZED VIEW IF EXISTS pitching_game_stats") op.execute("DROP MATERIALIZED VIEW IF EXISTS game_stats") ``` #### 6. Create View Refresh Service **File**: `backend/app/services/stat_view_refresher.py` (NEW) ```python """ Service to refresh materialized views. """ from sqlalchemy import text from app.database.database import async_session class StatViewRefresher: """Manages refreshing of statistical materialized views.""" @staticmethod async def refresh_game_stats(game_id: UUID) -> None: """ Refresh stats for a specific game (partial refresh). Note: PostgreSQL doesn't support partial refresh of materialized views. We'd need to implement this with regular views or partitioning. For now, refresh all views after game completion. """ await StatViewRefresher.refresh_all() @staticmethod async def refresh_all() -> None: """Refresh all statistical views.""" async with async_session() as session: await session.execute(text("REFRESH MATERIALIZED VIEW CONCURRENTLY batting_game_stats")) await session.execute(text("REFRESH MATERIALIZED VIEW CONCURRENTLY pitching_game_stats")) await session.execute(text("REFRESH MATERIALIZED VIEW CONCURRENTLY game_stats")) await session.commit() ``` #### 7. Box Score Retrieval **File**: `backend/app/services/box_score_service.py` (NEW) ```python """ Service to retrieve formatted box scores. """ from sqlalchemy import text from app.database.database import async_session class BoxScoreService: """Retrieves box score data from materialized views.""" @staticmethod async def get_box_score(game_id: UUID) -> dict: """ Get complete box score for a game. Returns dict with: - game_stats: Team totals and linescore - batting_stats: List of batting lines - pitching_stats: List of pitching lines """ async with async_session() as session: # Get game stats game_query = text(""" SELECT * FROM game_stats WHERE game_id = :game_id """) game_result = await session.execute(game_query, {"game_id": str(game_id)}) game_stats = game_result.fetchone()._asdict() if game_result else {} # Get batting stats batting_query = text(""" SELECT * FROM batting_game_stats WHERE game_id = :game_id ORDER BY lineup_id """) batting_result = await session.execute(batting_query, {"game_id": str(game_id)}) batting_stats = [row._asdict() for row in batting_result] # Get pitching stats pitching_query = text(""" SELECT * FROM pitching_game_stats WHERE game_id = :game_id ORDER BY lineup_id """) pitching_result = await session.execute(pitching_query, {"game_id": str(game_id)}) pitching_stats = [row._asdict() for row in pitching_result] return { 'game_stats': game_stats, 'batting_stats': batting_stats, 'pitching_stats': pitching_stats } ``` --- ## Refresh Strategy ### When to Refresh Views? **Option 1: After Each Play** (Real-time) - Pros: Always up-to-date - Cons: Performance impact, too frequent **Option 2: After Game Completion** (Recommended for MVP) - Pros: Good balance of freshness and performance - Cons: Stats not visible until game ends **Option 3: On Demand** (Terminal client) - Pros: User controls refresh - Cons: Stale data between refreshes **Option 4: Scheduled** (Every 5 minutes) - Pros: Predictable, low impact - Cons: 5-minute staleness **Recommended**: Option 2 (game completion) + Option 3 (on-demand for testing) --- ## Benefits of This Approach 1. **No Stat Tracker Needed**: Eliminated ~400 lines of stat tracking code 2. **Single Source of Truth**: Plays table is the only data we write 3. **Consistency Guaranteed**: Stats are always derived from plays 4. **Simple Logic**: PlayStatCalculator is straightforward field mapping 5. **SQL Optimized**: PostgreSQL handles aggregation efficiently 6. **Flexible Queries**: Can create any stat aggregation we want with new views 7. **Audit Trail**: Play-by-play history is complete record --- ## Migration Path 1. Add stat fields to plays table (migration 004) 2. Update Play model 3. Create PlayStatCalculator 4. Modify GameEngine to populate stat fields when recording plays 5. Create materialized views (migration 005) 6. Create BoxScoreService to query views 7. Add refresh logic (after game completion) 8. Update terminal client to show box scores **Estimated Time**: 4-5 hours (much less than original 6-8 hours!) --- ## Questions for Approval 1. ✅ Approve materialized view approach (vs separate stat tables)? 2. ✅ Add stat fields to plays table? 3. ✅ Refresh strategy: after game completion + on-demand? 4. ✅ Use legacy field names (pa, ab, run, hit) for compatibility? 5. ✅ Skip experimental fields (bphr, xba, etc.) for Phase 3.5? Once approved, I'll update `phase-3.5-polish-stats.md` with this approach.