strat-gameplay-webapp/.claude/implementation/STAT_SYSTEM_ANALYSIS.md
Cal Corum b5677d0c55 CLAUDE: Phase 3.5 Planning - Code Polish & Statistics System
Completed comprehensive planning for Phase 3.5 with focus on production
readiness through materialized views approach for statistics.

Planning Documents Created:
- STAT_SYSTEM_ANALYSIS.md: Analysis of existing major-domo schema
  * Reviewed legacy BattingStat/PitchingStat tables (deprecated)
  * Analyzed existing /plays/batting and /plays/pitching endpoints
  * Evaluated 3 approaches (legacy port, modern, hybrid)

- STAT_SYSTEM_MATERIALIZED_VIEWS.md: Recommended approach
  * PostgreSQL materialized views (following major-domo pattern)
  * Add stat fields to plays table (18 new columns)
  * 3 views: batting_game_stats, pitching_game_stats, game_stats
  * PlayStatCalculator service (~150 lines vs 400+ for StatTracker)
  * 80% less code, single source of truth, always consistent

- phase-3.5-polish-stats.md: Complete implementation plan
  * Task 1: Game Statistics System (materialized views)
  * Task 2: Authorization Framework (WebSocket security)
  * Task 3: Uncapped Hit Decision Trees
  * Task 4: Code Cleanup (remove TODOs, integrate features)
  * Task 5: Integration Test Infrastructure
  * Estimated: 16-24 hours (2-3 days)

NEXT_SESSION.md Updates:
- Phase 3.5 ready to begin (0% → implementation phase)
- Complete task breakdown with acceptance criteria
- Materialized view approach detailed
- Commit strategy for 3 separate commits
- Files to review before starting

Implementation Status Updates:
- Phase 3: 100% Complete (688 tests passing)
- Phase 3F: Substitution system fully tested
- Phase 3.5: Planning complete, ready for implementation
- Updated component status table with Phase 3 completion

Key Decisions:
- Use materialized views (not separate stat tables)
- Add stat fields to plays table
- Refresh views after game completion + on-demand
- Use legacy field names (pa, ab, run, hit) for compatibility
- Skip experimental fields (bphr, xba, etc.) for MVP

Benefits of Materialized Views:
- 80% less code (~400 lines → ~150 lines)
- Single source of truth (plays table)
- Always consistent (stats derived, not tracked)
- Follows existing major-domo pattern
- PostgreSQL optimized (indexed, cached)

Next Steps:
1. Implement PlayStatCalculator (map PlayOutcome → stats)
2. Add stat fields to plays table (migration 004)
3. Create materialized views (migration 005)
4. Create BoxScoreService (query views)
5. Refresh logic after game completion

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 16:08:23 -06:00

506 lines
16 KiB
Markdown

# Statistics System Analysis - Existing vs Proposed
**Date**: 2025-11-06
**Context**: Phase 3.5 planning - game statistics tracking
---
## Existing Schema (major-domo/database)
### Current Tables (Peewee/SQLite)
#### 1. **Result** - Game-level results
```python
class Result(BaseModel):
week = IntegerField()
game = IntegerField()
awayteam = ForeignKeyField(Team)
hometeam = ForeignKeyField(Team)
awayscore = IntegerField()
homescore = IntegerField()
season = IntegerField()
scorecard_url = CharField(null=True)
```
**Purpose**: Tracks final game scores and results
**Scope**: High-level game outcomes only (no inning-by-inning breakdown)
---
#### 2. **BattingStat** - Per-game batting statistics
```python
class BattingStat(BaseModel):
player = ForeignKeyField(Player)
team = ForeignKeyField(Team)
pos = CharField() # Position
# Standard batting stats
pa = IntegerField() # Plate appearances
ab = IntegerField() # At bats
run = IntegerField() # Runs scored
hit = IntegerField() # Hits
rbi = IntegerField() # RBIs
double = IntegerField()
triple = IntegerField()
hr = IntegerField() # Home runs
bb = IntegerField() # Walks
so = IntegerField() # Strikeouts
hbp = IntegerField() # Hit by pitch
sac = IntegerField() # Sacrifices
ibb = IntegerField() # Intentional walks
gidp = IntegerField() # Ground into double play
# Baserunning stats
sb = IntegerField() # Stolen bases
cs = IntegerField() # Caught stealing
# Batter-pitcher matchup stats (unclear what these are)
bphr = IntegerField()
bpfo = IntegerField()
bp1b = IntegerField()
bplo = IntegerField()
# X-Check related? (commented out in queries)
xba = IntegerField()
xbt = IntegerField()
xch = IntegerField()
xhit = IntegerField()
# Fielding stats (also tracked here)
error = IntegerField()
pb = IntegerField() # Passed balls (catcher)
sbc = IntegerField() # Stolen base chances (catcher)
csc = IntegerField() # Caught stealing by catcher
roba = IntegerField() # (unknown - commented out)
robs = IntegerField() # (unknown - commented out)
raa = IntegerField() # (unknown - commented out)
rto = IntegerField() # (unknown - commented out)
# Game context
week = IntegerField()
game = IntegerField()
season = IntegerField()
```
**Purpose**: Complete batting stats per player per game
**Key Features**:
- Very comprehensive (40+ fields!)
- Includes fielding stats (errors, passed balls, caught stealing)
- Has unknown/experimental fields (xba, xbt, roba, etc.)
- Designed for per-game granularity
---
#### 3. **PitchingStat** - Per-game pitching statistics
```python
class PitchingStat(BaseModel):
player = ForeignKeyField(Player)
team = ForeignKeyField(Team)
# Standard pitching stats
ip = FloatField() # Innings pitched (5.1 = 5 1/3 innings)
hit = FloatField() # Hits allowed
run = FloatField() # Runs allowed
erun = FloatField() # Earned runs allowed
so = FloatField() # Strikeouts
bb = FloatField() # Walks
hbp = FloatField() # Hit batters
wp = FloatField() # Wild pitches
balk = FloatField() # Balks
hr = FloatField() # Home runs allowed
# Relief pitcher stats
ir = FloatField() # Inherited runners
irs = FloatField() # Inherited runners scored
# Game results
gs = FloatField() # Games started
win = FloatField() # Wins
loss = FloatField() # Losses
hold = FloatField() # Holds
sv = FloatField() # Saves
bsv = FloatField() # Blown saves
# Game context
week = IntegerField()
game = IntegerField()
season = IntegerField()
```
**Purpose**: Complete pitching stats per player per game
**Key Features**:
- Standard pitching metrics
- Relief pitcher tracking (inherited runners)
- Win/loss/save tracking
- Uses FloatField for most stats (fractional values)
---
## New Web App Architecture (FastAPI/PostgreSQL)
### Current Database (SQLAlchemy)
We already have these tables:
- `games` - Game metadata (game_id, league_id, teams, status)
- `lineups` - Player lineup entries (with substitution support)
- `plays` - Individual play records (play_result, dice_rolls, etc.)
### Gap Analysis
**What we DON'T have:**
1. ❌ Per-game player statistics aggregation
2. ❌ Team-level game statistics
3. ❌ Linescore (runs per inning) storage
4. ❌ Easy box score retrieval
**What we DO have:**
- ✅ All raw play data (we can reconstruct stats from plays)
- ✅ Player lineup tracking
- ✅ Game metadata
---
## Options for Phase 3.5
### Option 1: Adapt Existing Schema (RECOMMENDED)
**Approach**: Create SQLAlchemy versions of existing tables with modern improvements
**New Tables**:
```python
# Option 1A: Direct port with minimal changes
class PlayerGameStats(Base):
"""
Mirrors BattingStat/PitchingStat but combines both.
Uses existing field names for compatibility.
"""
__tablename__ = "player_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Standard batting (from BattingStat)
pa = Column(Integer, default=0)
ab = Column(Integer, default=0)
run = Column(Integer, default=0) # Note: 'run' not 'runs'
hit = Column(Integer, default=0) # Note: 'hit' not 'hits'
rbi = Column(Integer, default=0)
double = Column(Integer, default=0)
triple = Column(Integer, default=0)
hr = Column(Integer, default=0)
bb = Column(Integer, default=0)
so = Column(Integer, default=0)
hbp = Column(Integer, default=0)
sac = Column(Integer, default=0)
sb = Column(Integer, default=0)
cs = Column(Integer, default=0)
# Pitching stats (from PitchingStat)
ip = Column(Float, default=0.0)
# ... (all pitching fields)
# Fielding (if needed)
error = Column(Integer, default=0)
pb = Column(Integer, default=0)
class GameStats(Base):
"""
Mirrors Result table but adds linescore and more detail.
"""
__tablename__ = "game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), unique=True)
# Team totals
home_runs = Column(Integer, default=0)
away_runs = Column(Integer, default=0)
home_hits = Column(Integer, default=0)
away_hits = Column(Integer, default=0)
home_errors = Column(Integer, default=0)
away_errors = Column(Integer, default=0)
# NEW: Linescore (not in legacy schema)
home_linescore = Column(JSON) # [0, 1, 0, 3, ...]
away_linescore = Column(JSON) # [1, 0, 2, 0, ...]
```
**Benefits**:
- ✅ Familiar field names for migration
- ✅ Can reuse existing query patterns
- ✅ Easy to submit to legacy REST API
- ✅ Compatible with existing league tooling
**Drawbacks**:
- ⚠️ Some weird field names ('run' vs 'runs', 'hit' vs 'hits')
- ⚠️ Lots of fields we may not use (bphr, bpfo, xba, etc.)
---
### Option 2: Modern Schema from Scratch
**Approach**: Design clean schema optimized for web app
**New Tables**:
```python
class PlayerGameStats(Base):
"""Modern schema with clear naming."""
__tablename__ = "player_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Batting (clearer names)
plate_appearances = Column(Integer, default=0) # Not 'pa'
at_bats = Column(Integer, default=0) # Not 'ab'
runs = Column(Integer, default=0) # Not 'run'
hits = Column(Integer, default=0) # Not 'hit'
rbis = Column(Integer, default=0) # Not 'rbi'
# ... etc
# Pitching
innings_pitched = Column(Float, default=0.0) # Not 'ip'
batters_faced = Column(Integer, default=0) # NEW field!
# ... etc
```
**Benefits**:
- ✅ Clean, readable field names
- ✅ Only fields we actually use
- ✅ Modern best practices
**Drawbacks**:
- ❌ Need field mapping for legacy API submission
- ❌ Different from existing league patterns
- ❌ More work to integrate with existing tooling
---
### Option 3: Hybrid Approach (RECOMMENDED)
**Approach**: Use existing field names but omit unused experimental fields
**New Tables**:
```python
class PlayerGameStats(Base):
"""
Hybrid: existing field names for core stats,
skip experimental/unused fields.
"""
__tablename__ = "player_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Batting - use legacy names for core stats
pa = Column(Integer, default=0)
ab = Column(Integer, default=0)
run = Column(Integer, default=0)
hit = Column(Integer, default=0)
rbi = Column(Integer, default=0)
double = Column(Integer, default=0)
triple = Column(Integer, default=0)
hr = Column(Integer, default=0)
bb = Column(Integer, default=0)
so = Column(Integer, default=0)
hbp = Column(Integer, default=0)
sb = Column(Integer, default=0)
cs = Column(Integer, default=0)
# Pitching - use legacy names
ip = Column(Float, default=0.0)
# hits_allowed = Column(Integer, default=0) # 'hit' field reused?
runs_allowed = Column(Integer, default=0)
earned_runs = Column(Integer, default=0)
walks_allowed = Column(Integer, default=0)
strikeouts_pitched = Column(Integer, default=0)
# ... etc
# SKIP: bphr, bpfo, bp1b, bplo (unclear purpose)
# SKIP: xba, xbt, xch, xhit (commented out in queries anyway)
# SKIP: roba, robs, raa, rto (unknown, commented out)
```
**Benefits**:
- ✅ Familiar core field names
- ✅ Cleaner (no experimental cruft)
- ✅ Easy legacy API submission
- ✅ Room to add new fields as needed
**Drawbacks**:
- ⚠️ Still has some odd naming ('run' vs 'runs')
---
## Questions for Discussion
### 1. **Field Naming Convention**
- Should we use legacy names (`pa`, `ab`, `run`, `hit`) for compatibility?
- Or modernize (`plate_appearances`, `at_bats`, `runs`, `hits`) for clarity?
- **Recommendation**: Hybrid - legacy names for core stats, modern for new fields
### 2. **Experimental Fields**
- What are `bphr`, `bpfo`, `bp1b`, `bplo`? (batter-pitcher matchup stats?)
- What are `xba`, `xbt`, `xch`, `xhit`? (X-Check related?)
- Do we need these or can we skip?
- **Recommendation**: Skip for Phase 3.5 MVP, add later if needed
### 3. **Pitching Stats Field Overlap**
- Legacy schema uses `hit` for both batting hits and pitching hits allowed
- How do we handle this in a combined table?
- **Options**:
- A) Separate `batting_hit` and `pitching_hit` columns
- B) Reuse `hit` field (batting or pitching based on position)
- C) Separate BattingGameStats and PitchingGameStats tables
- **Recommendation**: Option C - separate tables for clarity
### 4. **Linescore Storage**
- Legacy has no linescore (runs by inning)
- We need this for box score display
- JSON array format: `[0, 1, 0, 3, ...]` per team?
- **Recommendation**: Add to GameStats table as JSON
### 5. **Integration with Legacy API**
- Do we need to submit stats to existing league REST API?
- If yes, what format does it expect?
- Can we provide a mapping layer?
- **Recommendation**: Yes, create mapping function for API submission
### 6. **Stat Calculation Approach**
- **Option A**: Real-time updates (update stats after each play)
- **Option B**: Post-game aggregation (calculate from plays table)
- **Option C**: Hybrid (real-time updates + verification from plays)
- **Recommendation**: Option A (real-time) for performance, with Option B as backup/verification
---
## Proposed Schema (Final Recommendation)
### Recommended Approach: Hybrid with Separate Tables
```python
class GameStats(Base):
"""Game-level statistics and linescore."""
__tablename__ = "game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), unique=True)
created_at = Column(DateTime, default=func.now())
# Team totals
home_runs = Column(Integer, default=0)
away_runs = Column(Integer, default=0)
home_hits = Column(Integer, default=0)
away_hits = Column(Integer, default=0)
home_errors = Column(Integer, default=0)
away_errors = Column(Integer, default=0)
# Linescore (NEW - not in legacy)
home_linescore = Column(JSON) # [0, 1, 0, 3, ...]
away_linescore = Column(JSON) # [1, 0, 2, 0, ...]
class BattingGameStats(Base):
"""Batting statistics per player per game."""
__tablename__ = "batting_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Use legacy field names for compatibility
pa = Column(Integer, default=0)
ab = Column(Integer, default=0)
run = Column(Integer, default=0)
hit = Column(Integer, default=0)
rbi = Column(Integer, default=0)
double = Column(Integer, default=0)
triple = Column(Integer, default=0)
hr = Column(Integer, default=0)
bb = Column(Integer, default=0)
so = Column(Integer, default=0)
hbp = Column(Integer, default=0)
sac = Column(Integer, default=0)
sb = Column(Integer, default=0)
cs = Column(Integer, default=0)
gidp = Column(Integer, default=0)
__table_args__ = (
Index('idx_batting_game_stats_game', 'game_id'),
Index('idx_batting_game_stats_lineup', 'lineup_id'),
)
class PitchingGameStats(Base):
"""Pitching statistics per player per game."""
__tablename__ = "pitching_game_stats"
id = Column(Integer, primary_key=True)
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
# Use legacy field names for compatibility
ip = Column(Float, default=0.0)
hit = Column(Integer, default=0) # Hits allowed
run = Column(Integer, default=0) # Runs allowed
erun = Column(Integer, default=0) # Earned runs (legacy: 'erun')
so = Column(Integer, default=0) # Strikeouts
bb = Column(Integer, default=0) # Walks
hbp = Column(Integer, default=0) # Hit batters
hr = Column(Integer, default=0) # Home runs allowed
wp = Column(Integer, default=0) # Wild pitches
# NEW: Batters faced (not in legacy, but useful)
batters_faced = Column(Integer, default=0)
__table_args__ = (
Index('idx_pitching_game_stats_game', 'game_id'),
Index('idx_pitching_game_stats_lineup', 'lineup_id'),
)
```
**Why Separate Tables?**
1. Clearer intent (batting vs pitching)
2. No field name conflicts (`hit` means different things)
3. Easier queries (no need to filter by position)
4. Better indexing (separate indexes per table)
5. Matches real-world mental model (batters bat, pitchers pitch)
---
## Implementation Notes
1. **Dual Position Players** (e.g., Shohei Ohtani):
- Two records: one in BattingGameStats, one in PitchingGameStats
- Both reference same lineup_id
- Aggregated separately
2. **Legacy API Submission**:
- Create mapping function: `web_stats_to_legacy_format()`
- Convert our stats to legacy BattingStat/PitchingStat format
- Submit to existing REST API endpoints
3. **Stat Tracker Service**:
- Maintains in-memory cache for current games
- Async writes to database
- Provides `get_box_score()` method with formatted output
---
## Next Steps
**Please advise on**:
1. ✅ Approve hybrid approach with separate batting/pitching tables?
2. ✅ Use legacy field names for core stats?
3. ✅ Skip experimental fields (bphr, xba, etc.) for now?
4. ✅ Add linescore to GameStats?
5. ✅ Plan for legacy API submission?
Once approved, I'll update `phase-3.5-polish-stats.md` with the corrected schema.