Completed comprehensive planning for Phase 3.5 with focus on production readiness through materialized views approach for statistics. Planning Documents Created: - STAT_SYSTEM_ANALYSIS.md: Analysis of existing major-domo schema * Reviewed legacy BattingStat/PitchingStat tables (deprecated) * Analyzed existing /plays/batting and /plays/pitching endpoints * Evaluated 3 approaches (legacy port, modern, hybrid) - STAT_SYSTEM_MATERIALIZED_VIEWS.md: Recommended approach * PostgreSQL materialized views (following major-domo pattern) * Add stat fields to plays table (18 new columns) * 3 views: batting_game_stats, pitching_game_stats, game_stats * PlayStatCalculator service (~150 lines vs 400+ for StatTracker) * 80% less code, single source of truth, always consistent - phase-3.5-polish-stats.md: Complete implementation plan * Task 1: Game Statistics System (materialized views) * Task 2: Authorization Framework (WebSocket security) * Task 3: Uncapped Hit Decision Trees * Task 4: Code Cleanup (remove TODOs, integrate features) * Task 5: Integration Test Infrastructure * Estimated: 16-24 hours (2-3 days) NEXT_SESSION.md Updates: - Phase 3.5 ready to begin (0% → implementation phase) - Complete task breakdown with acceptance criteria - Materialized view approach detailed - Commit strategy for 3 separate commits - Files to review before starting Implementation Status Updates: - Phase 3: 100% Complete (688 tests passing) - Phase 3F: Substitution system fully tested - Phase 3.5: Planning complete, ready for implementation - Updated component status table with Phase 3 completion Key Decisions: - Use materialized views (not separate stat tables) - Add stat fields to plays table - Refresh views after game completion + on-demand - Use legacy field names (pa, ab, run, hit) for compatibility - Skip experimental fields (bphr, xba, etc.) for MVP Benefits of Materialized Views: - 80% less code (~400 lines → ~150 lines) - Single source of truth (plays table) - Always consistent (stats derived, not tracked) - Follows existing major-domo pattern - PostgreSQL optimized (indexed, cached) Next Steps: 1. Implement PlayStatCalculator (map PlayOutcome → stats) 2. Add stat fields to plays table (migration 004) 3. Create materialized views (migration 005) 4. Create BoxScoreService (query views) 5. Refresh logic after game completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
16 KiB
Statistics System Analysis - Existing vs Proposed
Date: 2025-11-06 Context: Phase 3.5 planning - game statistics tracking
Existing Schema (major-domo/database)
Current Tables (Peewee/SQLite)
1. Result - Game-level results
class Result(BaseModel):
week = IntegerField()
game = IntegerField()
awayteam = ForeignKeyField(Team)
hometeam = ForeignKeyField(Team)
awayscore = IntegerField()
homescore = IntegerField()
season = IntegerField()
scorecard_url = CharField(null=True)
Purpose: Tracks final game scores and results Scope: High-level game outcomes only (no inning-by-inning breakdown)
2. BattingStat - Per-game batting statistics
class BattingStat(BaseModel):
player = ForeignKeyField(Player)
team = ForeignKeyField(Team)
pos = CharField() # Position
# Standard batting stats
pa = IntegerField() # Plate appearances
ab = IntegerField() # At bats
run = IntegerField() # Runs scored
hit = IntegerField() # Hits
rbi = IntegerField() # RBIs
double = IntegerField()
triple = IntegerField()
hr = IntegerField() # Home runs
bb = IntegerField() # Walks
so = IntegerField() # Strikeouts
hbp = IntegerField() # Hit by pitch
sac = IntegerField() # Sacrifices
ibb = IntegerField() # Intentional walks
gidp = IntegerField() # Ground into double play
# Baserunning stats
sb = IntegerField() # Stolen bases
cs = IntegerField() # Caught stealing
# Batter-pitcher matchup stats (unclear what these are)
bphr = IntegerField()
bpfo = IntegerField()
bp1b = IntegerField()
bplo = IntegerField()
# X-Check related? (commented out in queries)
xba = IntegerField()
xbt = IntegerField()
xch = IntegerField()
xhit = IntegerField()
# Fielding stats (also tracked here)
error = IntegerField()
pb = IntegerField() # Passed balls (catcher)
sbc = IntegerField() # Stolen base chances (catcher)
csc = IntegerField() # Caught stealing by catcher
roba = IntegerField() # (unknown - commented out)
robs = IntegerField() # (unknown - commented out)
raa = IntegerField() # (unknown - commented out)
rto = IntegerField() # (unknown - commented out)
# Game context
week = IntegerField()
game = IntegerField()
season = IntegerField()
Purpose: Complete batting stats per player per game Key Features:
- Very comprehensive (40+ fields!)
- Includes fielding stats (errors, passed balls, caught stealing)
- Has unknown/experimental fields (xba, xbt, roba, etc.)
- Designed for per-game granularity
3. PitchingStat - Per-game pitching statistics
class PitchingStat(BaseModel):
player = ForeignKeyField(Player)
team = ForeignKeyField(Team)
# Standard pitching stats
ip = FloatField() # Innings pitched (5.1 = 5 1/3 innings)
hit = FloatField() # Hits allowed
run = FloatField() # Runs allowed
erun = FloatField() # Earned runs allowed
so = FloatField() # Strikeouts
bb = FloatField() # Walks
hbp = FloatField() # Hit batters
wp = FloatField() # Wild pitches
balk = FloatField() # Balks
hr = FloatField() # Home runs allowed
# Relief pitcher stats
ir = FloatField() # Inherited runners
irs = FloatField() # Inherited runners scored
# Game results
gs = FloatField() # Games started
win = FloatField() # Wins
loss = FloatField() # Losses
hold = FloatField() # Holds
sv = FloatField() # Saves
bsv = FloatField() # Blown saves
# Game context
week = IntegerField()
game = IntegerField()
season = IntegerField()
Purpose: Complete pitching stats per player per game Key Features:
- Standard pitching metrics
- Relief pitcher tracking (inherited runners)
- Win/loss/save tracking
- Uses FloatField for most stats (fractional values)
New Web App Architecture (FastAPI/PostgreSQL)
Current Database (SQLAlchemy)
We already have these tables:
games- Game metadata (game_id, league_id, teams, status)lineups- Player lineup entries (with substitution support)plays- Individual play records (play_result, dice_rolls, etc.)
Gap Analysis
What we DON'T have:
- ❌ Per-game player statistics aggregation
- ❌ Team-level game statistics
- ❌ Linescore (runs per inning) storage
- ❌ Easy box score retrieval
What we DO have:
- ✅ All raw play data (we can reconstruct stats from plays)
- ✅ Player lineup tracking
- ✅ Game metadata
Options for Phase 3.5
Option 1: Adapt Existing Schema (RECOMMENDED)
Approach: Create SQLAlchemy versions of existing tables with modern improvements
New Tables:
# Option 1A: Direct port with minimal changes
class PlayerGameStats(Base):
"""
Mirrors BattingStat/PitchingStat but combines both.
Uses existing field names for compatibility.
"""
__tablename__ = "player_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Standard batting (from BattingStat)
pa = Column(Integer, default=0)
ab = Column(Integer, default=0)
run = Column(Integer, default=0) # Note: 'run' not 'runs'
hit = Column(Integer, default=0) # Note: 'hit' not 'hits'
rbi = Column(Integer, default=0)
double = Column(Integer, default=0)
triple = Column(Integer, default=0)
hr = Column(Integer, default=0)
bb = Column(Integer, default=0)
so = Column(Integer, default=0)
hbp = Column(Integer, default=0)
sac = Column(Integer, default=0)
sb = Column(Integer, default=0)
cs = Column(Integer, default=0)
# Pitching stats (from PitchingStat)
ip = Column(Float, default=0.0)
# ... (all pitching fields)
# Fielding (if needed)
error = Column(Integer, default=0)
pb = Column(Integer, default=0)
class GameStats(Base):
"""
Mirrors Result table but adds linescore and more detail.
"""
__tablename__ = "game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), unique=True)
# Team totals
home_runs = Column(Integer, default=0)
away_runs = Column(Integer, default=0)
home_hits = Column(Integer, default=0)
away_hits = Column(Integer, default=0)
home_errors = Column(Integer, default=0)
away_errors = Column(Integer, default=0)
# NEW: Linescore (not in legacy schema)
home_linescore = Column(JSON) # [0, 1, 0, 3, ...]
away_linescore = Column(JSON) # [1, 0, 2, 0, ...]
Benefits:
- ✅ Familiar field names for migration
- ✅ Can reuse existing query patterns
- ✅ Easy to submit to legacy REST API
- ✅ Compatible with existing league tooling
Drawbacks:
- ⚠️ Some weird field names ('run' vs 'runs', 'hit' vs 'hits')
- ⚠️ Lots of fields we may not use (bphr, bpfo, xba, etc.)
Option 2: Modern Schema from Scratch
Approach: Design clean schema optimized for web app
New Tables:
class PlayerGameStats(Base):
"""Modern schema with clear naming."""
__tablename__ = "player_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Batting (clearer names)
plate_appearances = Column(Integer, default=0) # Not 'pa'
at_bats = Column(Integer, default=0) # Not 'ab'
runs = Column(Integer, default=0) # Not 'run'
hits = Column(Integer, default=0) # Not 'hit'
rbis = Column(Integer, default=0) # Not 'rbi'
# ... etc
# Pitching
innings_pitched = Column(Float, default=0.0) # Not 'ip'
batters_faced = Column(Integer, default=0) # NEW field!
# ... etc
Benefits:
- ✅ Clean, readable field names
- ✅ Only fields we actually use
- ✅ Modern best practices
Drawbacks:
- ❌ Need field mapping for legacy API submission
- ❌ Different from existing league patterns
- ❌ More work to integrate with existing tooling
Option 3: Hybrid Approach (RECOMMENDED)
Approach: Use existing field names but omit unused experimental fields
New Tables:
class PlayerGameStats(Base):
"""
Hybrid: existing field names for core stats,
skip experimental/unused fields.
"""
__tablename__ = "player_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Batting - use legacy names for core stats
pa = Column(Integer, default=0)
ab = Column(Integer, default=0)
run = Column(Integer, default=0)
hit = Column(Integer, default=0)
rbi = Column(Integer, default=0)
double = Column(Integer, default=0)
triple = Column(Integer, default=0)
hr = Column(Integer, default=0)
bb = Column(Integer, default=0)
so = Column(Integer, default=0)
hbp = Column(Integer, default=0)
sb = Column(Integer, default=0)
cs = Column(Integer, default=0)
# Pitching - use legacy names
ip = Column(Float, default=0.0)
# hits_allowed = Column(Integer, default=0) # 'hit' field reused?
runs_allowed = Column(Integer, default=0)
earned_runs = Column(Integer, default=0)
walks_allowed = Column(Integer, default=0)
strikeouts_pitched = Column(Integer, default=0)
# ... etc
# SKIP: bphr, bpfo, bp1b, bplo (unclear purpose)
# SKIP: xba, xbt, xch, xhit (commented out in queries anyway)
# SKIP: roba, robs, raa, rto (unknown, commented out)
Benefits:
- ✅ Familiar core field names
- ✅ Cleaner (no experimental cruft)
- ✅ Easy legacy API submission
- ✅ Room to add new fields as needed
Drawbacks:
- ⚠️ Still has some odd naming ('run' vs 'runs')
Questions for Discussion
1. Field Naming Convention
- Should we use legacy names (
pa,ab,run,hit) for compatibility? - Or modernize (
plate_appearances,at_bats,runs,hits) for clarity? - Recommendation: Hybrid - legacy names for core stats, modern for new fields
2. Experimental Fields
- What are
bphr,bpfo,bp1b,bplo? (batter-pitcher matchup stats?) - What are
xba,xbt,xch,xhit? (X-Check related?) - Do we need these or can we skip?
- Recommendation: Skip for Phase 3.5 MVP, add later if needed
3. Pitching Stats Field Overlap
- Legacy schema uses
hitfor both batting hits and pitching hits allowed - How do we handle this in a combined table?
- Options:
- A) Separate
batting_hitandpitching_hitcolumns - B) Reuse
hitfield (batting or pitching based on position) - C) Separate BattingGameStats and PitchingGameStats tables
- A) Separate
- Recommendation: Option C - separate tables for clarity
4. Linescore Storage
- Legacy has no linescore (runs by inning)
- We need this for box score display
- JSON array format:
[0, 1, 0, 3, ...]per team? - Recommendation: Add to GameStats table as JSON
5. Integration with Legacy API
- Do we need to submit stats to existing league REST API?
- If yes, what format does it expect?
- Can we provide a mapping layer?
- Recommendation: Yes, create mapping function for API submission
6. Stat Calculation Approach
- Option A: Real-time updates (update stats after each play)
- Option B: Post-game aggregation (calculate from plays table)
- Option C: Hybrid (real-time updates + verification from plays)
- Recommendation: Option A (real-time) for performance, with Option B as backup/verification
Proposed Schema (Final Recommendation)
Recommended Approach: Hybrid with Separate Tables
class GameStats(Base):
"""Game-level statistics and linescore."""
__tablename__ = "game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), unique=True)
created_at = Column(DateTime, default=func.now())
# Team totals
home_runs = Column(Integer, default=0)
away_runs = Column(Integer, default=0)
home_hits = Column(Integer, default=0)
away_hits = Column(Integer, default=0)
home_errors = Column(Integer, default=0)
away_errors = Column(Integer, default=0)
# Linescore (NEW - not in legacy)
home_linescore = Column(JSON) # [0, 1, 0, 3, ...]
away_linescore = Column(JSON) # [1, 0, 2, 0, ...]
class BattingGameStats(Base):
"""Batting statistics per player per game."""
__tablename__ = "batting_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Use legacy field names for compatibility
pa = Column(Integer, default=0)
ab = Column(Integer, default=0)
run = Column(Integer, default=0)
hit = Column(Integer, default=0)
rbi = Column(Integer, default=0)
double = Column(Integer, default=0)
triple = Column(Integer, default=0)
hr = Column(Integer, default=0)
bb = Column(Integer, default=0)
so = Column(Integer, default=0)
hbp = Column(Integer, default=0)
sac = Column(Integer, default=0)
sb = Column(Integer, default=0)
cs = Column(Integer, default=0)
gidp = Column(Integer, default=0)
__table_args__ = (
Index('idx_batting_game_stats_game', 'game_id'),
Index('idx_batting_game_stats_lineup', 'lineup_id'),
)
class PitchingGameStats(Base):
"""Pitching statistics per player per game."""
__tablename__ = "pitching_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Use legacy field names for compatibility
ip = Column(Float, default=0.0)
hit = Column(Integer, default=0) # Hits allowed
run = Column(Integer, default=0) # Runs allowed
erun = Column(Integer, default=0) # Earned runs (legacy: 'erun')
so = Column(Integer, default=0) # Strikeouts
bb = Column(Integer, default=0) # Walks
hbp = Column(Integer, default=0) # Hit batters
hr = Column(Integer, default=0) # Home runs allowed
wp = Column(Integer, default=0) # Wild pitches
# NEW: Batters faced (not in legacy, but useful)
batters_faced = Column(Integer, default=0)
__table_args__ = (
Index('idx_pitching_game_stats_game', 'game_id'),
Index('idx_pitching_game_stats_lineup', 'lineup_id'),
)
Why Separate Tables?
- Clearer intent (batting vs pitching)
- No field name conflicts (
hitmeans different things) - Easier queries (no need to filter by position)
- Better indexing (separate indexes per table)
- Matches real-world mental model (batters bat, pitchers pitch)
Implementation Notes
-
Dual Position Players (e.g., Shohei Ohtani):
- Two records: one in BattingGameStats, one in PitchingGameStats
- Both reference same lineup_id
- Aggregated separately
-
Legacy API Submission:
- Create mapping function:
web_stats_to_legacy_format() - Convert our stats to legacy BattingStat/PitchingStat format
- Submit to existing REST API endpoints
- Create mapping function:
-
Stat Tracker Service:
- Maintains in-memory cache for current games
- Async writes to database
- Provides
get_box_score()method with formatted output
Next Steps
Please advise on:
- ✅ Approve hybrid approach with separate batting/pitching tables?
- ✅ Use legacy field names for core stats?
- ✅ Skip experimental fields (bphr, xba, etc.) for now?
- ✅ Add linescore to GameStats?
- ✅ Plan for legacy API submission?
Once approved, I'll update phase-3.5-polish-stats.md with the corrected schema.