strat-gameplay-webapp/.claude/implementation/STAT_SYSTEM_ANALYSIS.md
Cal Corum b5677d0c55 CLAUDE: Phase 3.5 Planning - Code Polish & Statistics System
Completed comprehensive planning for Phase 3.5 with focus on production
readiness through materialized views approach for statistics.

Planning Documents Created:
- STAT_SYSTEM_ANALYSIS.md: Analysis of existing major-domo schema
  * Reviewed legacy BattingStat/PitchingStat tables (deprecated)
  * Analyzed existing /plays/batting and /plays/pitching endpoints
  * Evaluated 3 approaches (legacy port, modern, hybrid)

- STAT_SYSTEM_MATERIALIZED_VIEWS.md: Recommended approach
  * PostgreSQL materialized views (following major-domo pattern)
  * Add stat fields to plays table (18 new columns)
  * 3 views: batting_game_stats, pitching_game_stats, game_stats
  * PlayStatCalculator service (~150 lines vs 400+ for StatTracker)
  * 80% less code, single source of truth, always consistent

- phase-3.5-polish-stats.md: Complete implementation plan
  * Task 1: Game Statistics System (materialized views)
  * Task 2: Authorization Framework (WebSocket security)
  * Task 3: Uncapped Hit Decision Trees
  * Task 4: Code Cleanup (remove TODOs, integrate features)
  * Task 5: Integration Test Infrastructure
  * Estimated: 16-24 hours (2-3 days)

NEXT_SESSION.md Updates:
- Phase 3.5 ready to begin (0% → implementation phase)
- Complete task breakdown with acceptance criteria
- Materialized view approach detailed
- Commit strategy for 3 separate commits
- Files to review before starting

Implementation Status Updates:
- Phase 3: 100% Complete (688 tests passing)
- Phase 3F: Substitution system fully tested
- Phase 3.5: Planning complete, ready for implementation
- Updated component status table with Phase 3 completion

Key Decisions:
- Use materialized views (not separate stat tables)
- Add stat fields to plays table
- Refresh views after game completion + on-demand
- Use legacy field names (pa, ab, run, hit) for compatibility
- Skip experimental fields (bphr, xba, etc.) for MVP

Benefits of Materialized Views:
- 80% less code (~400 lines → ~150 lines)
- Single source of truth (plays table)
- Always consistent (stats derived, not tracked)
- Follows existing major-domo pattern
- PostgreSQL optimized (indexed, cached)

Next Steps:
1. Implement PlayStatCalculator (map PlayOutcome → stats)
2. Add stat fields to plays table (migration 004)
3. Create materialized views (migration 005)
4. Create BoxScoreService (query views)
5. Refresh logic after game completion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 16:08:23 -06:00

16 KiB

Statistics System Analysis - Existing vs Proposed

Date: 2025-11-06 Context: Phase 3.5 planning - game statistics tracking


Existing Schema (major-domo/database)

Current Tables (Peewee/SQLite)

1. Result - Game-level results

class Result(BaseModel):
    week = IntegerField()
    game = IntegerField()
    awayteam = ForeignKeyField(Team)
    hometeam = ForeignKeyField(Team)
    awayscore = IntegerField()
    homescore = IntegerField()
    season = IntegerField()
    scorecard_url = CharField(null=True)

Purpose: Tracks final game scores and results Scope: High-level game outcomes only (no inning-by-inning breakdown)


2. BattingStat - Per-game batting statistics

class BattingStat(BaseModel):
    player = ForeignKeyField(Player)
    team = ForeignKeyField(Team)
    pos = CharField()  # Position

    # Standard batting stats
    pa = IntegerField()      # Plate appearances
    ab = IntegerField()      # At bats
    run = IntegerField()     # Runs scored
    hit = IntegerField()     # Hits
    rbi = IntegerField()     # RBIs
    double = IntegerField()
    triple = IntegerField()
    hr = IntegerField()      # Home runs
    bb = IntegerField()      # Walks
    so = IntegerField()      # Strikeouts
    hbp = IntegerField()     # Hit by pitch
    sac = IntegerField()     # Sacrifices
    ibb = IntegerField()     # Intentional walks
    gidp = IntegerField()    # Ground into double play

    # Baserunning stats
    sb = IntegerField()      # Stolen bases
    cs = IntegerField()      # Caught stealing

    # Batter-pitcher matchup stats (unclear what these are)
    bphr = IntegerField()
    bpfo = IntegerField()
    bp1b = IntegerField()
    bplo = IntegerField()

    # X-Check related? (commented out in queries)
    xba = IntegerField()
    xbt = IntegerField()
    xch = IntegerField()
    xhit = IntegerField()

    # Fielding stats (also tracked here)
    error = IntegerField()
    pb = IntegerField()      # Passed balls (catcher)
    sbc = IntegerField()     # Stolen base chances (catcher)
    csc = IntegerField()     # Caught stealing by catcher
    roba = IntegerField()    # (unknown - commented out)
    robs = IntegerField()    # (unknown - commented out)
    raa = IntegerField()     # (unknown - commented out)
    rto = IntegerField()     # (unknown - commented out)

    # Game context
    week = IntegerField()
    game = IntegerField()
    season = IntegerField()

Purpose: Complete batting stats per player per game Key Features:

  • Very comprehensive (40+ fields!)
  • Includes fielding stats (errors, passed balls, caught stealing)
  • Has unknown/experimental fields (xba, xbt, roba, etc.)
  • Designed for per-game granularity

3. PitchingStat - Per-game pitching statistics

class PitchingStat(BaseModel):
    player = ForeignKeyField(Player)
    team = ForeignKeyField(Team)

    # Standard pitching stats
    ip = FloatField()        # Innings pitched (5.1 = 5 1/3 innings)
    hit = FloatField()       # Hits allowed
    run = FloatField()       # Runs allowed
    erun = FloatField()      # Earned runs allowed
    so = FloatField()        # Strikeouts
    bb = FloatField()        # Walks
    hbp = FloatField()       # Hit batters
    wp = FloatField()        # Wild pitches
    balk = FloatField()      # Balks
    hr = FloatField()        # Home runs allowed

    # Relief pitcher stats
    ir = FloatField()        # Inherited runners
    irs = FloatField()       # Inherited runners scored

    # Game results
    gs = FloatField()        # Games started
    win = FloatField()       # Wins
    loss = FloatField()      # Losses
    hold = FloatField()      # Holds
    sv = FloatField()        # Saves
    bsv = FloatField()       # Blown saves

    # Game context
    week = IntegerField()
    game = IntegerField()
    season = IntegerField()

Purpose: Complete pitching stats per player per game Key Features:

  • Standard pitching metrics
  • Relief pitcher tracking (inherited runners)
  • Win/loss/save tracking
  • Uses FloatField for most stats (fractional values)

New Web App Architecture (FastAPI/PostgreSQL)

Current Database (SQLAlchemy)

We already have these tables:

  • games - Game metadata (game_id, league_id, teams, status)
  • lineups - Player lineup entries (with substitution support)
  • plays - Individual play records (play_result, dice_rolls, etc.)

Gap Analysis

What we DON'T have:

  1. Per-game player statistics aggregation
  2. Team-level game statistics
  3. Linescore (runs per inning) storage
  4. Easy box score retrieval

What we DO have:

  • All raw play data (we can reconstruct stats from plays)
  • Player lineup tracking
  • Game metadata

Options for Phase 3.5

Approach: Create SQLAlchemy versions of existing tables with modern improvements

New Tables:

# Option 1A: Direct port with minimal changes
class PlayerGameStats(Base):
    """
    Mirrors BattingStat/PitchingStat but combines both.
    Uses existing field names for compatibility.
    """
    __tablename__ = "player_game_stats"

    id = Column(Integer, primary_key=True)
    game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
    lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)

    # Standard batting (from BattingStat)
    pa = Column(Integer, default=0)
    ab = Column(Integer, default=0)
    run = Column(Integer, default=0)  # Note: 'run' not 'runs'
    hit = Column(Integer, default=0)  # Note: 'hit' not 'hits'
    rbi = Column(Integer, default=0)
    double = Column(Integer, default=0)
    triple = Column(Integer, default=0)
    hr = Column(Integer, default=0)
    bb = Column(Integer, default=0)
    so = Column(Integer, default=0)
    hbp = Column(Integer, default=0)
    sac = Column(Integer, default=0)
    sb = Column(Integer, default=0)
    cs = Column(Integer, default=0)

    # Pitching stats (from PitchingStat)
    ip = Column(Float, default=0.0)
    # ... (all pitching fields)

    # Fielding (if needed)
    error = Column(Integer, default=0)
    pb = Column(Integer, default=0)

class GameStats(Base):
    """
    Mirrors Result table but adds linescore and more detail.
    """
    __tablename__ = "game_stats"

    id = Column(Integer, primary_key=True)
    game_id = Column(UUID, ForeignKey("games.id"), unique=True)

    # Team totals
    home_runs = Column(Integer, default=0)
    away_runs = Column(Integer, default=0)
    home_hits = Column(Integer, default=0)
    away_hits = Column(Integer, default=0)
    home_errors = Column(Integer, default=0)
    away_errors = Column(Integer, default=0)

    # NEW: Linescore (not in legacy schema)
    home_linescore = Column(JSON)  # [0, 1, 0, 3, ...]
    away_linescore = Column(JSON)  # [1, 0, 2, 0, ...]

Benefits:

  • Familiar field names for migration
  • Can reuse existing query patterns
  • Easy to submit to legacy REST API
  • Compatible with existing league tooling

Drawbacks:

  • ⚠️ Some weird field names ('run' vs 'runs', 'hit' vs 'hits')
  • ⚠️ Lots of fields we may not use (bphr, bpfo, xba, etc.)

Option 2: Modern Schema from Scratch

Approach: Design clean schema optimized for web app

New Tables:

class PlayerGameStats(Base):
    """Modern schema with clear naming."""
    __tablename__ = "player_game_stats"

    id = Column(Integer, primary_key=True)
    game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
    lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)

    # Batting (clearer names)
    plate_appearances = Column(Integer, default=0)  # Not 'pa'
    at_bats = Column(Integer, default=0)            # Not 'ab'
    runs = Column(Integer, default=0)               # Not 'run'
    hits = Column(Integer, default=0)               # Not 'hit'
    rbis = Column(Integer, default=0)               # Not 'rbi'
    # ... etc

    # Pitching
    innings_pitched = Column(Float, default=0.0)    # Not 'ip'
    batters_faced = Column(Integer, default=0)      # NEW field!
    # ... etc

Benefits:

  • Clean, readable field names
  • Only fields we actually use
  • Modern best practices

Drawbacks:

  • Need field mapping for legacy API submission
  • Different from existing league patterns
  • More work to integrate with existing tooling

Approach: Use existing field names but omit unused experimental fields

New Tables:

class PlayerGameStats(Base):
    """
    Hybrid: existing field names for core stats,
    skip experimental/unused fields.
    """
    __tablename__ = "player_game_stats"

    id = Column(Integer, primary_key=True)
    game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
    lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)

    # Batting - use legacy names for core stats
    pa = Column(Integer, default=0)
    ab = Column(Integer, default=0)
    run = Column(Integer, default=0)
    hit = Column(Integer, default=0)
    rbi = Column(Integer, default=0)
    double = Column(Integer, default=0)
    triple = Column(Integer, default=0)
    hr = Column(Integer, default=0)
    bb = Column(Integer, default=0)
    so = Column(Integer, default=0)
    hbp = Column(Integer, default=0)
    sb = Column(Integer, default=0)
    cs = Column(Integer, default=0)

    # Pitching - use legacy names
    ip = Column(Float, default=0.0)
    # hits_allowed = Column(Integer, default=0)  # 'hit' field reused?
    runs_allowed = Column(Integer, default=0)
    earned_runs = Column(Integer, default=0)
    walks_allowed = Column(Integer, default=0)
    strikeouts_pitched = Column(Integer, default=0)
    # ... etc

    # SKIP: bphr, bpfo, bp1b, bplo (unclear purpose)
    # SKIP: xba, xbt, xch, xhit (commented out in queries anyway)
    # SKIP: roba, robs, raa, rto (unknown, commented out)

Benefits:

  • Familiar core field names
  • Cleaner (no experimental cruft)
  • Easy legacy API submission
  • Room to add new fields as needed

Drawbacks:

  • ⚠️ Still has some odd naming ('run' vs 'runs')

Questions for Discussion

1. Field Naming Convention

  • Should we use legacy names (pa, ab, run, hit) for compatibility?
  • Or modernize (plate_appearances, at_bats, runs, hits) for clarity?
  • Recommendation: Hybrid - legacy names for core stats, modern for new fields

2. Experimental Fields

  • What are bphr, bpfo, bp1b, bplo? (batter-pitcher matchup stats?)
  • What are xba, xbt, xch, xhit? (X-Check related?)
  • Do we need these or can we skip?
  • Recommendation: Skip for Phase 3.5 MVP, add later if needed

3. Pitching Stats Field Overlap

  • Legacy schema uses hit for both batting hits and pitching hits allowed
  • How do we handle this in a combined table?
  • Options:
    • A) Separate batting_hit and pitching_hit columns
    • B) Reuse hit field (batting or pitching based on position)
    • C) Separate BattingGameStats and PitchingGameStats tables
  • Recommendation: Option C - separate tables for clarity

4. Linescore Storage

  • Legacy has no linescore (runs by inning)
  • We need this for box score display
  • JSON array format: [0, 1, 0, 3, ...] per team?
  • Recommendation: Add to GameStats table as JSON

5. Integration with Legacy API

  • Do we need to submit stats to existing league REST API?
  • If yes, what format does it expect?
  • Can we provide a mapping layer?
  • Recommendation: Yes, create mapping function for API submission

6. Stat Calculation Approach

  • Option A: Real-time updates (update stats after each play)
  • Option B: Post-game aggregation (calculate from plays table)
  • Option C: Hybrid (real-time updates + verification from plays)
  • Recommendation: Option A (real-time) for performance, with Option B as backup/verification

Proposed Schema (Final Recommendation)

class GameStats(Base):
    """Game-level statistics and linescore."""
    __tablename__ = "game_stats"

    id = Column(Integer, primary_key=True)
    game_id = Column(UUID, ForeignKey("games.id"), unique=True)
    created_at = Column(DateTime, default=func.now())

    # Team totals
    home_runs = Column(Integer, default=0)
    away_runs = Column(Integer, default=0)
    home_hits = Column(Integer, default=0)
    away_hits = Column(Integer, default=0)
    home_errors = Column(Integer, default=0)
    away_errors = Column(Integer, default=0)

    # Linescore (NEW - not in legacy)
    home_linescore = Column(JSON)  # [0, 1, 0, 3, ...]
    away_linescore = Column(JSON)  # [1, 0, 2, 0, ...]


class BattingGameStats(Base):
    """Batting statistics per player per game."""
    __tablename__ = "batting_game_stats"

    id = Column(Integer, primary_key=True)
    game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
    lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)

    # Use legacy field names for compatibility
    pa = Column(Integer, default=0)
    ab = Column(Integer, default=0)
    run = Column(Integer, default=0)
    hit = Column(Integer, default=0)
    rbi = Column(Integer, default=0)
    double = Column(Integer, default=0)
    triple = Column(Integer, default=0)
    hr = Column(Integer, default=0)
    bb = Column(Integer, default=0)
    so = Column(Integer, default=0)
    hbp = Column(Integer, default=0)
    sac = Column(Integer, default=0)
    sb = Column(Integer, default=0)
    cs = Column(Integer, default=0)
    gidp = Column(Integer, default=0)

    __table_args__ = (
        Index('idx_batting_game_stats_game', 'game_id'),
        Index('idx_batting_game_stats_lineup', 'lineup_id'),
    )


class PitchingGameStats(Base):
    """Pitching statistics per player per game."""
    __tablename__ = "pitching_game_stats"

    id = Column(Integer, primary_key=True)
    game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
    lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)

    # Use legacy field names for compatibility
    ip = Column(Float, default=0.0)
    hit = Column(Integer, default=0)        # Hits allowed
    run = Column(Integer, default=0)        # Runs allowed
    erun = Column(Integer, default=0)       # Earned runs (legacy: 'erun')
    so = Column(Integer, default=0)         # Strikeouts
    bb = Column(Integer, default=0)         # Walks
    hbp = Column(Integer, default=0)        # Hit batters
    hr = Column(Integer, default=0)         # Home runs allowed
    wp = Column(Integer, default=0)         # Wild pitches

    # NEW: Batters faced (not in legacy, but useful)
    batters_faced = Column(Integer, default=0)

    __table_args__ = (
        Index('idx_pitching_game_stats_game', 'game_id'),
        Index('idx_pitching_game_stats_lineup', 'lineup_id'),
    )

Why Separate Tables?

  1. Clearer intent (batting vs pitching)
  2. No field name conflicts (hit means different things)
  3. Easier queries (no need to filter by position)
  4. Better indexing (separate indexes per table)
  5. Matches real-world mental model (batters bat, pitchers pitch)

Implementation Notes

  1. Dual Position Players (e.g., Shohei Ohtani):

    • Two records: one in BattingGameStats, one in PitchingGameStats
    • Both reference same lineup_id
    • Aggregated separately
  2. Legacy API Submission:

    • Create mapping function: web_stats_to_legacy_format()
    • Convert our stats to legacy BattingStat/PitchingStat format
    • Submit to existing REST API endpoints
  3. Stat Tracker Service:

    • Maintains in-memory cache for current games
    • Async writes to database
    • Provides get_box_score() method with formatted output

Next Steps

Please advise on:

  1. Approve hybrid approach with separate batting/pitching tables?
  2. Use legacy field names for core stats?
  3. Skip experimental fields (bphr, xba, etc.) for now?
  4. Add linescore to GameStats?
  5. Plan for legacy API submission?

Once approved, I'll update phase-3.5-polish-stats.md with the corrected schema.