Completed comprehensive planning for Phase 3.5 with focus on production readiness through materialized views approach for statistics. Planning Documents Created: - STAT_SYSTEM_ANALYSIS.md: Analysis of existing major-domo schema * Reviewed legacy BattingStat/PitchingStat tables (deprecated) * Analyzed existing /plays/batting and /plays/pitching endpoints * Evaluated 3 approaches (legacy port, modern, hybrid) - STAT_SYSTEM_MATERIALIZED_VIEWS.md: Recommended approach * PostgreSQL materialized views (following major-domo pattern) * Add stat fields to plays table (18 new columns) * 3 views: batting_game_stats, pitching_game_stats, game_stats * PlayStatCalculator service (~150 lines vs 400+ for StatTracker) * 80% less code, single source of truth, always consistent - phase-3.5-polish-stats.md: Complete implementation plan * Task 1: Game Statistics System (materialized views) * Task 2: Authorization Framework (WebSocket security) * Task 3: Uncapped Hit Decision Trees * Task 4: Code Cleanup (remove TODOs, integrate features) * Task 5: Integration Test Infrastructure * Estimated: 16-24 hours (2-3 days) NEXT_SESSION.md Updates: - Phase 3.5 ready to begin (0% → implementation phase) - Complete task breakdown with acceptance criteria - Materialized view approach detailed - Commit strategy for 3 separate commits - Files to review before starting Implementation Status Updates: - Phase 3: 100% Complete (688 tests passing) - Phase 3F: Substitution system fully tested - Phase 3.5: Planning complete, ready for implementation - Updated component status table with Phase 3 completion Key Decisions: - Use materialized views (not separate stat tables) - Add stat fields to plays table - Refresh views after game completion + on-demand - Use legacy field names (pa, ab, run, hit) for compatibility - Skip experimental fields (bphr, xba, etc.) for MVP Benefits of Materialized Views: - 80% less code (~400 lines → ~150 lines) - Single source of truth (plays table) - Always consistent (stats derived, not tracked) - Follows existing major-domo pattern - PostgreSQL optimized (indexed, cached) Next Steps: 1. Implement PlayStatCalculator (map PlayOutcome → stats) 2. Add stat fields to plays table (migration 004) 3. Create materialized views (migration 005) 4. Create BoxScoreService (query views) 5. Refresh logic after game completion 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
506 lines
16 KiB
Markdown
506 lines
16 KiB
Markdown
# Statistics System Analysis - Existing vs Proposed
|
|
|
|
**Date**: 2025-11-06
|
|
**Context**: Phase 3.5 planning - game statistics tracking
|
|
|
|
---
|
|
|
|
## Existing Schema (major-domo/database)
|
|
|
|
### Current Tables (Peewee/SQLite)
|
|
|
|
#### 1. **Result** - Game-level results
|
|
```python
|
|
class Result(BaseModel):
|
|
week = IntegerField()
|
|
game = IntegerField()
|
|
awayteam = ForeignKeyField(Team)
|
|
hometeam = ForeignKeyField(Team)
|
|
awayscore = IntegerField()
|
|
homescore = IntegerField()
|
|
season = IntegerField()
|
|
scorecard_url = CharField(null=True)
|
|
```
|
|
|
|
**Purpose**: Tracks final game scores and results
|
|
**Scope**: High-level game outcomes only (no inning-by-inning breakdown)
|
|
|
|
---
|
|
|
|
#### 2. **BattingStat** - Per-game batting statistics
|
|
```python
|
|
class BattingStat(BaseModel):
|
|
player = ForeignKeyField(Player)
|
|
team = ForeignKeyField(Team)
|
|
pos = CharField() # Position
|
|
|
|
# Standard batting stats
|
|
pa = IntegerField() # Plate appearances
|
|
ab = IntegerField() # At bats
|
|
run = IntegerField() # Runs scored
|
|
hit = IntegerField() # Hits
|
|
rbi = IntegerField() # RBIs
|
|
double = IntegerField()
|
|
triple = IntegerField()
|
|
hr = IntegerField() # Home runs
|
|
bb = IntegerField() # Walks
|
|
so = IntegerField() # Strikeouts
|
|
hbp = IntegerField() # Hit by pitch
|
|
sac = IntegerField() # Sacrifices
|
|
ibb = IntegerField() # Intentional walks
|
|
gidp = IntegerField() # Ground into double play
|
|
|
|
# Baserunning stats
|
|
sb = IntegerField() # Stolen bases
|
|
cs = IntegerField() # Caught stealing
|
|
|
|
# Batter-pitcher matchup stats (unclear what these are)
|
|
bphr = IntegerField()
|
|
bpfo = IntegerField()
|
|
bp1b = IntegerField()
|
|
bplo = IntegerField()
|
|
|
|
# X-Check related? (commented out in queries)
|
|
xba = IntegerField()
|
|
xbt = IntegerField()
|
|
xch = IntegerField()
|
|
xhit = IntegerField()
|
|
|
|
# Fielding stats (also tracked here)
|
|
error = IntegerField()
|
|
pb = IntegerField() # Passed balls (catcher)
|
|
sbc = IntegerField() # Stolen base chances (catcher)
|
|
csc = IntegerField() # Caught stealing by catcher
|
|
roba = IntegerField() # (unknown - commented out)
|
|
robs = IntegerField() # (unknown - commented out)
|
|
raa = IntegerField() # (unknown - commented out)
|
|
rto = IntegerField() # (unknown - commented out)
|
|
|
|
# Game context
|
|
week = IntegerField()
|
|
game = IntegerField()
|
|
season = IntegerField()
|
|
```
|
|
|
|
**Purpose**: Complete batting stats per player per game
|
|
**Key Features**:
|
|
- Very comprehensive (40+ fields!)
|
|
- Includes fielding stats (errors, passed balls, caught stealing)
|
|
- Has unknown/experimental fields (xba, xbt, roba, etc.)
|
|
- Designed for per-game granularity
|
|
|
|
---
|
|
|
|
#### 3. **PitchingStat** - Per-game pitching statistics
|
|
```python
|
|
class PitchingStat(BaseModel):
|
|
player = ForeignKeyField(Player)
|
|
team = ForeignKeyField(Team)
|
|
|
|
# Standard pitching stats
|
|
ip = FloatField() # Innings pitched (5.1 = 5 1/3 innings)
|
|
hit = FloatField() # Hits allowed
|
|
run = FloatField() # Runs allowed
|
|
erun = FloatField() # Earned runs allowed
|
|
so = FloatField() # Strikeouts
|
|
bb = FloatField() # Walks
|
|
hbp = FloatField() # Hit batters
|
|
wp = FloatField() # Wild pitches
|
|
balk = FloatField() # Balks
|
|
hr = FloatField() # Home runs allowed
|
|
|
|
# Relief pitcher stats
|
|
ir = FloatField() # Inherited runners
|
|
irs = FloatField() # Inherited runners scored
|
|
|
|
# Game results
|
|
gs = FloatField() # Games started
|
|
win = FloatField() # Wins
|
|
loss = FloatField() # Losses
|
|
hold = FloatField() # Holds
|
|
sv = FloatField() # Saves
|
|
bsv = FloatField() # Blown saves
|
|
|
|
# Game context
|
|
week = IntegerField()
|
|
game = IntegerField()
|
|
season = IntegerField()
|
|
```
|
|
|
|
**Purpose**: Complete pitching stats per player per game
|
|
**Key Features**:
|
|
- Standard pitching metrics
|
|
- Relief pitcher tracking (inherited runners)
|
|
- Win/loss/save tracking
|
|
- Uses FloatField for most stats (fractional values)
|
|
|
|
---
|
|
|
|
## New Web App Architecture (FastAPI/PostgreSQL)
|
|
|
|
### Current Database (SQLAlchemy)
|
|
|
|
We already have these tables:
|
|
- `games` - Game metadata (game_id, league_id, teams, status)
|
|
- `lineups` - Player lineup entries (with substitution support)
|
|
- `plays` - Individual play records (play_result, dice_rolls, etc.)
|
|
|
|
### Gap Analysis
|
|
|
|
**What we DON'T have:**
|
|
1. ❌ Per-game player statistics aggregation
|
|
2. ❌ Team-level game statistics
|
|
3. ❌ Linescore (runs per inning) storage
|
|
4. ❌ Easy box score retrieval
|
|
|
|
**What we DO have:**
|
|
- ✅ All raw play data (we can reconstruct stats from plays)
|
|
- ✅ Player lineup tracking
|
|
- ✅ Game metadata
|
|
|
|
---
|
|
|
|
## Options for Phase 3.5
|
|
|
|
### Option 1: Adapt Existing Schema (RECOMMENDED)
|
|
|
|
**Approach**: Create SQLAlchemy versions of existing tables with modern improvements
|
|
|
|
**New Tables**:
|
|
|
|
```python
|
|
# Option 1A: Direct port with minimal changes
|
|
class PlayerGameStats(Base):
|
|
"""
|
|
Mirrors BattingStat/PitchingStat but combines both.
|
|
Uses existing field names for compatibility.
|
|
"""
|
|
__tablename__ = "player_game_stats"
|
|
|
|
id = Column(Integer, primary_key=True)
|
|
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
|
|
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
|
|
|
|
# Standard batting (from BattingStat)
|
|
pa = Column(Integer, default=0)
|
|
ab = Column(Integer, default=0)
|
|
run = Column(Integer, default=0) # Note: 'run' not 'runs'
|
|
hit = Column(Integer, default=0) # Note: 'hit' not 'hits'
|
|
rbi = Column(Integer, default=0)
|
|
double = Column(Integer, default=0)
|
|
triple = Column(Integer, default=0)
|
|
hr = Column(Integer, default=0)
|
|
bb = Column(Integer, default=0)
|
|
so = Column(Integer, default=0)
|
|
hbp = Column(Integer, default=0)
|
|
sac = Column(Integer, default=0)
|
|
sb = Column(Integer, default=0)
|
|
cs = Column(Integer, default=0)
|
|
|
|
# Pitching stats (from PitchingStat)
|
|
ip = Column(Float, default=0.0)
|
|
# ... (all pitching fields)
|
|
|
|
# Fielding (if needed)
|
|
error = Column(Integer, default=0)
|
|
pb = Column(Integer, default=0)
|
|
|
|
class GameStats(Base):
|
|
"""
|
|
Mirrors Result table but adds linescore and more detail.
|
|
"""
|
|
__tablename__ = "game_stats"
|
|
|
|
id = Column(Integer, primary_key=True)
|
|
game_id = Column(UUID, ForeignKey("games.id"), unique=True)
|
|
|
|
# Team totals
|
|
home_runs = Column(Integer, default=0)
|
|
away_runs = Column(Integer, default=0)
|
|
home_hits = Column(Integer, default=0)
|
|
away_hits = Column(Integer, default=0)
|
|
home_errors = Column(Integer, default=0)
|
|
away_errors = Column(Integer, default=0)
|
|
|
|
# NEW: Linescore (not in legacy schema)
|
|
home_linescore = Column(JSON) # [0, 1, 0, 3, ...]
|
|
away_linescore = Column(JSON) # [1, 0, 2, 0, ...]
|
|
```
|
|
|
|
**Benefits**:
|
|
- ✅ Familiar field names for migration
|
|
- ✅ Can reuse existing query patterns
|
|
- ✅ Easy to submit to legacy REST API
|
|
- ✅ Compatible with existing league tooling
|
|
|
|
**Drawbacks**:
|
|
- ⚠️ Some weird field names ('run' vs 'runs', 'hit' vs 'hits')
|
|
- ⚠️ Lots of fields we may not use (bphr, bpfo, xba, etc.)
|
|
|
|
---
|
|
|
|
### Option 2: Modern Schema from Scratch
|
|
|
|
**Approach**: Design clean schema optimized for web app
|
|
|
|
**New Tables**:
|
|
|
|
```python
|
|
class PlayerGameStats(Base):
|
|
"""Modern schema with clear naming."""
|
|
__tablename__ = "player_game_stats"
|
|
|
|
id = Column(Integer, primary_key=True)
|
|
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
|
|
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
|
|
|
|
# Batting (clearer names)
|
|
plate_appearances = Column(Integer, default=0) # Not 'pa'
|
|
at_bats = Column(Integer, default=0) # Not 'ab'
|
|
runs = Column(Integer, default=0) # Not 'run'
|
|
hits = Column(Integer, default=0) # Not 'hit'
|
|
rbis = Column(Integer, default=0) # Not 'rbi'
|
|
# ... etc
|
|
|
|
# Pitching
|
|
innings_pitched = Column(Float, default=0.0) # Not 'ip'
|
|
batters_faced = Column(Integer, default=0) # NEW field!
|
|
# ... etc
|
|
```
|
|
|
|
**Benefits**:
|
|
- ✅ Clean, readable field names
|
|
- ✅ Only fields we actually use
|
|
- ✅ Modern best practices
|
|
|
|
**Drawbacks**:
|
|
- ❌ Need field mapping for legacy API submission
|
|
- ❌ Different from existing league patterns
|
|
- ❌ More work to integrate with existing tooling
|
|
|
|
---
|
|
|
|
### Option 3: Hybrid Approach (RECOMMENDED)
|
|
|
|
**Approach**: Use existing field names but omit unused experimental fields
|
|
|
|
**New Tables**:
|
|
|
|
```python
|
|
class PlayerGameStats(Base):
|
|
"""
|
|
Hybrid: existing field names for core stats,
|
|
skip experimental/unused fields.
|
|
"""
|
|
__tablename__ = "player_game_stats"
|
|
|
|
id = Column(Integer, primary_key=True)
|
|
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
|
|
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
|
|
|
|
# Batting - use legacy names for core stats
|
|
pa = Column(Integer, default=0)
|
|
ab = Column(Integer, default=0)
|
|
run = Column(Integer, default=0)
|
|
hit = Column(Integer, default=0)
|
|
rbi = Column(Integer, default=0)
|
|
double = Column(Integer, default=0)
|
|
triple = Column(Integer, default=0)
|
|
hr = Column(Integer, default=0)
|
|
bb = Column(Integer, default=0)
|
|
so = Column(Integer, default=0)
|
|
hbp = Column(Integer, default=0)
|
|
sb = Column(Integer, default=0)
|
|
cs = Column(Integer, default=0)
|
|
|
|
# Pitching - use legacy names
|
|
ip = Column(Float, default=0.0)
|
|
# hits_allowed = Column(Integer, default=0) # 'hit' field reused?
|
|
runs_allowed = Column(Integer, default=0)
|
|
earned_runs = Column(Integer, default=0)
|
|
walks_allowed = Column(Integer, default=0)
|
|
strikeouts_pitched = Column(Integer, default=0)
|
|
# ... etc
|
|
|
|
# SKIP: bphr, bpfo, bp1b, bplo (unclear purpose)
|
|
# SKIP: xba, xbt, xch, xhit (commented out in queries anyway)
|
|
# SKIP: roba, robs, raa, rto (unknown, commented out)
|
|
```
|
|
|
|
**Benefits**:
|
|
- ✅ Familiar core field names
|
|
- ✅ Cleaner (no experimental cruft)
|
|
- ✅ Easy legacy API submission
|
|
- ✅ Room to add new fields as needed
|
|
|
|
**Drawbacks**:
|
|
- ⚠️ Still has some odd naming ('run' vs 'runs')
|
|
|
|
---
|
|
|
|
## Questions for Discussion
|
|
|
|
### 1. **Field Naming Convention**
|
|
- Should we use legacy names (`pa`, `ab`, `run`, `hit`) for compatibility?
|
|
- Or modernize (`plate_appearances`, `at_bats`, `runs`, `hits`) for clarity?
|
|
- **Recommendation**: Hybrid - legacy names for core stats, modern for new fields
|
|
|
|
### 2. **Experimental Fields**
|
|
- What are `bphr`, `bpfo`, `bp1b`, `bplo`? (batter-pitcher matchup stats?)
|
|
- What are `xba`, `xbt`, `xch`, `xhit`? (X-Check related?)
|
|
- Do we need these or can we skip?
|
|
- **Recommendation**: Skip for Phase 3.5 MVP, add later if needed
|
|
|
|
### 3. **Pitching Stats Field Overlap**
|
|
- Legacy schema uses `hit` for both batting hits and pitching hits allowed
|
|
- How do we handle this in a combined table?
|
|
- **Options**:
|
|
- A) Separate `batting_hit` and `pitching_hit` columns
|
|
- B) Reuse `hit` field (batting or pitching based on position)
|
|
- C) Separate BattingGameStats and PitchingGameStats tables
|
|
- **Recommendation**: Option C - separate tables for clarity
|
|
|
|
### 4. **Linescore Storage**
|
|
- Legacy has no linescore (runs by inning)
|
|
- We need this for box score display
|
|
- JSON array format: `[0, 1, 0, 3, ...]` per team?
|
|
- **Recommendation**: Add to GameStats table as JSON
|
|
|
|
### 5. **Integration with Legacy API**
|
|
- Do we need to submit stats to existing league REST API?
|
|
- If yes, what format does it expect?
|
|
- Can we provide a mapping layer?
|
|
- **Recommendation**: Yes, create mapping function for API submission
|
|
|
|
### 6. **Stat Calculation Approach**
|
|
- **Option A**: Real-time updates (update stats after each play)
|
|
- **Option B**: Post-game aggregation (calculate from plays table)
|
|
- **Option C**: Hybrid (real-time updates + verification from plays)
|
|
- **Recommendation**: Option A (real-time) for performance, with Option B as backup/verification
|
|
|
|
---
|
|
|
|
## Proposed Schema (Final Recommendation)
|
|
|
|
### Recommended Approach: Hybrid with Separate Tables
|
|
|
|
```python
|
|
class GameStats(Base):
|
|
"""Game-level statistics and linescore."""
|
|
__tablename__ = "game_stats"
|
|
|
|
id = Column(Integer, primary_key=True)
|
|
game_id = Column(UUID, ForeignKey("games.id"), unique=True)
|
|
created_at = Column(DateTime, default=func.now())
|
|
|
|
# Team totals
|
|
home_runs = Column(Integer, default=0)
|
|
away_runs = Column(Integer, default=0)
|
|
home_hits = Column(Integer, default=0)
|
|
away_hits = Column(Integer, default=0)
|
|
home_errors = Column(Integer, default=0)
|
|
away_errors = Column(Integer, default=0)
|
|
|
|
# Linescore (NEW - not in legacy)
|
|
home_linescore = Column(JSON) # [0, 1, 0, 3, ...]
|
|
away_linescore = Column(JSON) # [1, 0, 2, 0, ...]
|
|
|
|
|
|
class BattingGameStats(Base):
|
|
"""Batting statistics per player per game."""
|
|
__tablename__ = "batting_game_stats"
|
|
|
|
id = Column(Integer, primary_key=True)
|
|
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
|
|
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
|
|
|
|
# Use legacy field names for compatibility
|
|
pa = Column(Integer, default=0)
|
|
ab = Column(Integer, default=0)
|
|
run = Column(Integer, default=0)
|
|
hit = Column(Integer, default=0)
|
|
rbi = Column(Integer, default=0)
|
|
double = Column(Integer, default=0)
|
|
triple = Column(Integer, default=0)
|
|
hr = Column(Integer, default=0)
|
|
bb = Column(Integer, default=0)
|
|
so = Column(Integer, default=0)
|
|
hbp = Column(Integer, default=0)
|
|
sac = Column(Integer, default=0)
|
|
sb = Column(Integer, default=0)
|
|
cs = Column(Integer, default=0)
|
|
gidp = Column(Integer, default=0)
|
|
|
|
__table_args__ = (
|
|
Index('idx_batting_game_stats_game', 'game_id'),
|
|
Index('idx_batting_game_stats_lineup', 'lineup_id'),
|
|
)
|
|
|
|
|
|
class PitchingGameStats(Base):
|
|
"""Pitching statistics per player per game."""
|
|
__tablename__ = "pitching_game_stats"
|
|
|
|
id = Column(Integer, primary_key=True)
|
|
game_id = Column(UUID, ForeignKey("games.id"), nullable=False)
|
|
lineup_id = Column(Integer, ForeignKey("lineups.id"), nullable=False)
|
|
|
|
# Use legacy field names for compatibility
|
|
ip = Column(Float, default=0.0)
|
|
hit = Column(Integer, default=0) # Hits allowed
|
|
run = Column(Integer, default=0) # Runs allowed
|
|
erun = Column(Integer, default=0) # Earned runs (legacy: 'erun')
|
|
so = Column(Integer, default=0) # Strikeouts
|
|
bb = Column(Integer, default=0) # Walks
|
|
hbp = Column(Integer, default=0) # Hit batters
|
|
hr = Column(Integer, default=0) # Home runs allowed
|
|
wp = Column(Integer, default=0) # Wild pitches
|
|
|
|
# NEW: Batters faced (not in legacy, but useful)
|
|
batters_faced = Column(Integer, default=0)
|
|
|
|
__table_args__ = (
|
|
Index('idx_pitching_game_stats_game', 'game_id'),
|
|
Index('idx_pitching_game_stats_lineup', 'lineup_id'),
|
|
)
|
|
```
|
|
|
|
**Why Separate Tables?**
|
|
1. Clearer intent (batting vs pitching)
|
|
2. No field name conflicts (`hit` means different things)
|
|
3. Easier queries (no need to filter by position)
|
|
4. Better indexing (separate indexes per table)
|
|
5. Matches real-world mental model (batters bat, pitchers pitch)
|
|
|
|
---
|
|
|
|
## Implementation Notes
|
|
|
|
1. **Dual Position Players** (e.g., Shohei Ohtani):
|
|
- Two records: one in BattingGameStats, one in PitchingGameStats
|
|
- Both reference same lineup_id
|
|
- Aggregated separately
|
|
|
|
2. **Legacy API Submission**:
|
|
- Create mapping function: `web_stats_to_legacy_format()`
|
|
- Convert our stats to legacy BattingStat/PitchingStat format
|
|
- Submit to existing REST API endpoints
|
|
|
|
3. **Stat Tracker Service**:
|
|
- Maintains in-memory cache for current games
|
|
- Async writes to database
|
|
- Provides `get_box_score()` method with formatted output
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
**Please advise on**:
|
|
1. ✅ Approve hybrid approach with separate batting/pitching tables?
|
|
2. ✅ Use legacy field names for core stats?
|
|
3. ✅ Skip experimental fields (bphr, xba, etc.) for now?
|
|
4. ✅ Add linescore to GameStats?
|
|
5. ✅ Plan for legacy API submission?
|
|
|
|
Once approved, I'll update `phase-3.5-polish-stats.md` with the corrected schema.
|