Fixed critical bug where all outfielders were incorrectly assigned as DH
due to defense CSV column mismatch in retrosheet_data.py:
- Lines 889, 926: Changed column check from 'in row' to 'in pos_df.columns'
to correctly detect bis_runs_total availability
- Line 947: Fixed fallback from non-existent 'tz_runs_outfield' to
'tz_runs_total' which actually exists in Baseball Reference CSVs
Impact:
- Before: 57 DH players, 0 outfield positions
- After: 3 DH players, 62 outfielders (23 RF, 20 CF, 19 LF)
Added scripts/check_positions.sh:
- Validates position distribution after card generation
- Flags anomalous DH counts (>5 or >10%)
- Verifies outfield positions exist in cardpositions table
- Provides quick smoke test for defensive calculations
Updated CLAUDE.md:
- Added Position Validation section with check_positions.sh usage
- Documented outfield position bug in Common Issues & Solutions
- Included code examples and verification steps
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added one-time utility scripts used to prepare 2005 defense CSV files
for compatibility with retrosheet_data.py.
Scripts:
- rename_defense_columns.py: Renamed initial batch of defense columns
- RF/9 → range_factor_per_nine
- RF/G → range_factor_per_game
- DP → DP_def, E → E_def, Ch → chances, Inn → Inn_def
- CS% → caught_stealing_perc, PO → pickoffs
- Name-additional → key_bbref
- rename_additional_defense_columns.py: Second batch of column renames
- Fld% → fielding_perc
- Rtot → tz_runs_total, Rtot/yr → tz_runs_total_per_season
- Rtz → tz_runs_field, Rdp → tz_runs_infield
- undo_po_rename.py: Reverted PO → pickoffs for position players
- Kept 'pickoffs' for defense_p.csv (pitchers)
- Changed back to 'PO' for all other positions (c, 1b, 2b, etc.)
- test_retrosheet_integration.py: Integration test for retrosheet_transformer
- Validates batting and pitching stats loading
- Tests date range filtering
- Verifies player counts
These scripts have already been executed and the defense files are
properly formatted. Kept for historical reference and documentation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>