# Statistics System Analysis - Existing vs Proposed **Date**: 2025-11-06 **Context**: Phase 3.5 planning - game statistics tracking --- ## Existing Schema (major-domo/database) ### Current Tables (Peewee/SQLite) #### 1. **Result** - Game-level results ```python class Result(BaseModel): week = IntegerField() game = IntegerField() awayteam = ForeignKeyField(Team) hometeam = ForeignKeyField(Team) awayscore = IntegerField() homescore = IntegerField() season = IntegerField() scorecard_url = CharField(null=True) ``` **Purpose**: Tracks final game scores and results **Scope**: High-level game outcomes only (no inning-by-inning breakdown) --- #### 2. **BattingStat** - Per-game batting statistics ```python class BattingStat(BaseModel): player = ForeignKeyField(Player) team = ForeignKeyField(Team) pos = CharField() # Position # Standard batting stats pa = IntegerField() # Plate appearances ab = IntegerField() # At bats run = IntegerField() # Runs scored hit = IntegerField() # Hits rbi = IntegerField() # RBIs double = IntegerField() triple = IntegerField() hr = IntegerField() # Home runs bb = IntegerField() # Walks so = IntegerField() # Strikeouts hbp = IntegerField() # Hit by pitch sac = IntegerField() # Sacrifices ibb = IntegerField() # Intentional walks gidp = IntegerField() # Ground into double play # Baserunning stats sb = IntegerField() # Stolen bases cs = IntegerField() # Caught stealing # Batter-pitcher matchup stats (unclear what these are) bphr = IntegerField() bpfo = IntegerField() bp1b = IntegerField() bplo = IntegerField() # X-Check related? (commented out in queries) xba = IntegerField() xbt = IntegerField() xch = IntegerField() xhit = IntegerField() # Fielding stats (also tracked here) error = IntegerField() pb = IntegerField() # Passed balls (catcher) sbc = IntegerField() # Stolen base chances (catcher) csc = IntegerField() # Caught stealing by catcher roba = IntegerField() # (unknown - commented out) robs = IntegerField() # (unknown - commented out) raa = IntegerField() # (unknown - commented out) rto = IntegerField() # (unknown - commented out) # Game context week = IntegerField() game = IntegerField() season = IntegerField() ``` **Purpose**: Complete batting stats per player per game **Key Features**: - Very comprehensive (40+ fields!) - Includes fielding stats (errors, passed balls, caught stealing) - Has unknown/experimental fields (xba, xbt, roba, etc.) - Designed for per-game granularity --- #### 3. **PitchingStat** - Per-game pitching statistics ```python class PitchingStat(BaseModel): player = ForeignKeyField(Player) team = ForeignKeyField(Team) # Standard pitching stats ip = FloatField() # Innings pitched (5.1 = 5 1/3 innings) hit = FloatField() # Hits allowed run = FloatField() # Runs allowed erun = FloatField() # Earned runs allowed so = FloatField() # Strikeouts bb = FloatField() # Walks hbp = FloatField() # Hit batters wp = FloatField() # Wild pitches balk = FloatField() # Balks hr = FloatField() # Home runs allowed # Relief pitcher stats ir = FloatField() # Inherited runners irs = FloatField() # Inherited runners scored # Game results gs = FloatField() # Games started win = FloatField() # Wins loss = FloatField() # Losses hold = FloatField() # Holds sv = FloatField() # Saves bsv = FloatField() # Blown saves # Game context week = IntegerField() game = IntegerField() season = IntegerField() ``` **Purpose**: Complete pitching stats per player per game **Key Features**: - Standard pitching metrics - Relief pitcher tracking (inherited runners) - Win/loss/save tracking - Uses FloatField for most stats (fractional values) --- ## New Web App Architecture (FastAPI/PostgreSQL) ### Current Database (SQLAlchemy) We already have these tables: - `games` - Game metadata (game_id, league_id, teams, status) - `lineups` - Player lineup entries (with substitution support) - `plays` - Individual play records (play_result, dice_rolls, etc.) ### Gap Analysis **What we DON'T have:** 1. ❌ Per-game player statistics aggregation 2. ❌ Team-level game statistics 3. ❌ Linescore (runs per inning) storage 4. ❌ Easy box score retrieval **What we DO have:** - ✅ All raw play data (we can reconstruct stats from plays) - ✅ Player lineup tracking - ✅ Game metadata --- ## Options for Phase 3.5 ### Option 1: Adapt Existing Schema (RECOMMENDED) **Approach**: Create SQLAlchemy versions of existing tables with modern improvements **New Tables**: ```python # Option 1A: Direct port with minimal changes class PlayerGameStats(Base): """ Mirrors BattingStat/PitchingStat but combines both. Uses existing field names for compatibility. """ __tablename__ = "player_game_stats" id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id"), nullable=False) lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False) # Standard batting (from BattingStat) pa = Column(Integer, default=0) ab = Column(Integer, default=0) run = Column(Integer, default=0) # Note: 'run' not 'runs' hit = Column(Integer, default=0) # Note: 'hit' not 'hits' rbi = Column(Integer, default=0) double = Column(Integer, default=0) triple = Column(Integer, default=0) hr = Column(Integer, default=0) bb = Column(Integer, default=0) so = Column(Integer, default=0) hbp = Column(Integer, default=0) sac = Column(Integer, default=0) sb = Column(Integer, default=0) cs = Column(Integer, default=0) # Pitching stats (from PitchingStat) ip = Column(Float, default=0.0) # ... (all pitching fields) # Fielding (if needed) error = Column(Integer, default=0) pb = Column(Integer, default=0) class GameStats(Base): """ Mirrors Result table but adds linescore and more detail. """ __tablename__ = "game_stats" id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id"), unique=True) # Team totals home_runs = Column(Integer, default=0) away_runs = Column(Integer, default=0) home_hits = Column(Integer, default=0) away_hits = Column(Integer, default=0) home_errors = Column(Integer, default=0) away_errors = Column(Integer, default=0) # NEW: Linescore (not in legacy schema) home_linescore = Column(JSON) # [0, 1, 0, 3, ...] away_linescore = Column(JSON) # [1, 0, 2, 0, ...] ``` **Benefits**: - ✅ Familiar field names for migration - ✅ Can reuse existing query patterns - ✅ Easy to submit to legacy REST API - ✅ Compatible with existing league tooling **Drawbacks**: - ⚠️ Some weird field names ('run' vs 'runs', 'hit' vs 'hits') - ⚠️ Lots of fields we may not use (bphr, bpfo, xba, etc.) --- ### Option 2: Modern Schema from Scratch **Approach**: Design clean schema optimized for web app **New Tables**: ```python class PlayerGameStats(Base): """Modern schema with clear naming.""" __tablename__ = "player_game_stats" id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id"), nullable=False) lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False) # Batting (clearer names) plate_appearances = Column(Integer, default=0) # Not 'pa' at_bats = Column(Integer, default=0) # Not 'ab' runs = Column(Integer, default=0) # Not 'run' hits = Column(Integer, default=0) # Not 'hit' rbis = Column(Integer, default=0) # Not 'rbi' # ... etc # Pitching innings_pitched = Column(Float, default=0.0) # Not 'ip' batters_faced = Column(Integer, default=0) # NEW field! # ... etc ``` **Benefits**: - ✅ Clean, readable field names - ✅ Only fields we actually use - ✅ Modern best practices **Drawbacks**: - ❌ Need field mapping for legacy API submission - ❌ Different from existing league patterns - ❌ More work to integrate with existing tooling --- ### Option 3: Hybrid Approach (RECOMMENDED) **Approach**: Use existing field names but omit unused experimental fields **New Tables**: ```python class PlayerGameStats(Base): """ Hybrid: existing field names for core stats, skip experimental/unused fields. """ __tablename__ = "player_game_stats" id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id"), nullable=False) lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False) # Batting - use legacy names for core stats pa = Column(Integer, default=0) ab = Column(Integer, default=0) run = Column(Integer, default=0) hit = Column(Integer, default=0) rbi = Column(Integer, default=0) double = Column(Integer, default=0) triple = Column(Integer, default=0) hr = Column(Integer, default=0) bb = Column(Integer, default=0) so = Column(Integer, default=0) hbp = Column(Integer, default=0) sb = Column(Integer, default=0) cs = Column(Integer, default=0) # Pitching - use legacy names ip = Column(Float, default=0.0) # hits_allowed = Column(Integer, default=0) # 'hit' field reused? runs_allowed = Column(Integer, default=0) earned_runs = Column(Integer, default=0) walks_allowed = Column(Integer, default=0) strikeouts_pitched = Column(Integer, default=0) # ... etc # SKIP: bphr, bpfo, bp1b, bplo (unclear purpose) # SKIP: xba, xbt, xch, xhit (commented out in queries anyway) # SKIP: roba, robs, raa, rto (unknown, commented out) ``` **Benefits**: - ✅ Familiar core field names - ✅ Cleaner (no experimental cruft) - ✅ Easy legacy API submission - ✅ Room to add new fields as needed **Drawbacks**: - ⚠️ Still has some odd naming ('run' vs 'runs') --- ## Questions for Discussion ### 1. **Field Naming Convention** - Should we use legacy names (`pa`, `ab`, `run`, `hit`) for compatibility? - Or modernize (`plate_appearances`, `at_bats`, `runs`, `hits`) for clarity? - **Recommendation**: Hybrid - legacy names for core stats, modern for new fields ### 2. **Experimental Fields** - What are `bphr`, `bpfo`, `bp1b`, `bplo`? (batter-pitcher matchup stats?) - What are `xba`, `xbt`, `xch`, `xhit`? (X-Check related?) - Do we need these or can we skip? - **Recommendation**: Skip for Phase 3.5 MVP, add later if needed ### 3. **Pitching Stats Field Overlap** - Legacy schema uses `hit` for both batting hits and pitching hits allowed - How do we handle this in a combined table? - **Options**: - A) Separate `batting_hit` and `pitching_hit` columns - B) Reuse `hit` field (batting or pitching based on position) - C) Separate BattingGameStats and PitchingGameStats tables - **Recommendation**: Option C - separate tables for clarity ### 4. **Linescore Storage** - Legacy has no linescore (runs by inning) - We need this for box score display - JSON array format: `[0, 1, 0, 3, ...]` per team? - **Recommendation**: Add to GameStats table as JSON ### 5. **Integration with Legacy API** - Do we need to submit stats to existing league REST API? - If yes, what format does it expect? - Can we provide a mapping layer? - **Recommendation**: Yes, create mapping function for API submission ### 6. **Stat Calculation Approach** - **Option A**: Real-time updates (update stats after each play) - **Option B**: Post-game aggregation (calculate from plays table) - **Option C**: Hybrid (real-time updates + verification from plays) - **Recommendation**: Option A (real-time) for performance, with Option B as backup/verification --- ## Proposed Schema (Final Recommendation) ### Recommended Approach: Hybrid with Separate Tables ```python class GameStats(Base): """Game-level statistics and linescore.""" __tablename__ = "game_stats" id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id"), unique=True) created_at = Column(DateTime, default=func.now()) # Team totals home_runs = Column(Integer, default=0) away_runs = Column(Integer, default=0) home_hits = Column(Integer, default=0) away_hits = Column(Integer, default=0) home_errors = Column(Integer, default=0) away_errors = Column(Integer, default=0) # Linescore (NEW - not in legacy) home_linescore = Column(JSON) # [0, 1, 0, 3, ...] away_linescore = Column(JSON) # [1, 0, 2, 0, ...] class BattingGameStats(Base): """Batting statistics per player per game.""" __tablename__ = "batting_game_stats" id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id"), nullable=False) lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False) # Use legacy field names for compatibility pa = Column(Integer, default=0) ab = Column(Integer, default=0) run = Column(Integer, default=0) hit = Column(Integer, default=0) rbi = Column(Integer, default=0) double = Column(Integer, default=0) triple = Column(Integer, default=0) hr = Column(Integer, default=0) bb = Column(Integer, default=0) so = Column(Integer, default=0) hbp = Column(Integer, default=0) sac = Column(Integer, default=0) sb = Column(Integer, default=0) cs = Column(Integer, default=0) gidp = Column(Integer, default=0) __table_args__ = ( Index('idx_batting_game_stats_game', 'game_id'), Index('idx_batting_game_stats_lineup', 'lineup_id'), ) class PitchingGameStats(Base): """Pitching statistics per player per game.""" __tablename__ = "pitching_game_stats" id = Column(Integer, primary_key=True) game_id = Column(UUID, ForeignKey("games.id"), nullable=False) lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False) # Use legacy field names for compatibility ip = Column(Float, default=0.0) hit = Column(Integer, default=0) # Hits allowed run = Column(Integer, default=0) # Runs allowed erun = Column(Integer, default=0) # Earned runs (legacy: 'erun') so = Column(Integer, default=0) # Strikeouts bb = Column(Integer, default=0) # Walks hbp = Column(Integer, default=0) # Hit batters hr = Column(Integer, default=0) # Home runs allowed wp = Column(Integer, default=0) # Wild pitches # NEW: Batters faced (not in legacy, but useful) batters_faced = Column(Integer, default=0) __table_args__ = ( Index('idx_pitching_game_stats_game', 'game_id'), Index('idx_pitching_game_stats_lineup', 'lineup_id'), ) ``` **Why Separate Tables?** 1. Clearer intent (batting vs pitching) 2. No field name conflicts (`hit` means different things) 3. Easier queries (no need to filter by position) 4. Better indexing (separate indexes per table) 5. Matches real-world mental model (batters bat, pitchers pitch) --- ## Implementation Notes 1. **Dual Position Players** (e.g., Shohei Ohtani): - Two records: one in BattingGameStats, one in PitchingGameStats - Both reference same lineup_id - Aggregated separately 2. **Legacy API Submission**: - Create mapping function: `web_stats_to_legacy_format()` - Convert our stats to legacy BattingStat/PitchingStat format - Submit to existing REST API endpoints 3. **Stat Tracker Service**: - Maintains in-memory cache for current games - Async writes to database - Provides `get_box_score()` method with formatted output --- ## Next Steps **Please advise on**: 1. ✅ Approve hybrid approach with separate batting/pitching tables? 2. ✅ Use legacy field names for core stats? 3. ✅ Skip experimental fields (bphr, xba, etc.) for now? 4. ✅ Add linescore to GameStats? 5. ✅ Plan for legacy API submission? Once approved, I'll update `phase-3.5-polish-stats.md` with the corrected schema.