Commit Graph

27 Commits

Author SHA1 Message Date
Cal Corum
a20348ef7d Fix switch hitter detection for Rollins, Posada, and all switch hitters
Two bugs were preventing switch hitters from being correctly identified:

1. Missing handedness indicator in player names
   - Player names need special characters appended (* for left, # for switch)
   - new_player_payload() now appends '#' for switch hitters

2. Overly strict threshold in get_bat_hand()
   - Required 10+ total PAs to classify as switch hitter
   - Now correctly identifies ANY player who batted from both sides as 'S'
   - Removes arbitrary PA threshold that caused misclassification

Impact: Fixes Jimmie Rollins and Jorge Posada showing as 'R' instead of 'S'
       Applies to all switch hitters in retrosheet-based cardsets

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:04:33 -06:00
Cal Corum
e4347a0162 CLAUDE: Fix deletion running twice - only delete on first post_positions call
The post_positions function was being called twice (batters then pitchers).
Each call deleted ALL cardpositions, so the second call would delete the
batter positions that were just created.

Solution: Added delete_existing parameter (default False). Only the first
call (batters) sets delete_existing=True to clean up old data. The second
call (pitchers) just appends positions without deletion.

Result: Both batter and pitcher positions now persist correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 12:00:28 -06:00
Cal Corum
6746b51ca6 CLAUDE: Add missing db_delete import for cardposition cleanup
The deletion logic was failing with 'name db_delete is not defined' because
the function wasn't imported from db_calls.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 11:53:39 -06:00
Cal Corum
c23f6d1ada CLAUDE: Fix DH showing on cards for players with defensive positions
Root cause: post_positions() was upserting cardpositions, leaving stale DH
entries from the previous buggy run where outfielders had no defensive
positions.

Solution: Modified post_positions() to DELETE all existing cardpositions for
the cardset before posting new ones. This ensures:
- Stale DH positions are removed when players gain defensive positions
- Cards show only current, accurate positions
- No phantom positions persist across script runs

Example: Ichiro previously had both "RF" and "DH" cardpositions. With this
fix, only "RF" remains after re-running the script.

Updated CLAUDE.md with explanation of the cleanup logic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 11:48:36 -06:00
Cal Corum
5d7a0dd74b CLAUDE: Fix outfield position assignment bug and add validation script
Fixed critical bug where all outfielders were incorrectly assigned as DH
due to defense CSV column mismatch in retrosheet_data.py:

- Lines 889, 926: Changed column check from 'in row' to 'in pos_df.columns'
  to correctly detect bis_runs_total availability
- Line 947: Fixed fallback from non-existent 'tz_runs_outfield' to
  'tz_runs_total' which actually exists in Baseball Reference CSVs

Impact:
- Before: 57 DH players, 0 outfield positions
- After: 3 DH players, 62 outfielders (23 RF, 20 CF, 19 LF)

Added scripts/check_positions.sh:
- Validates position distribution after card generation
- Flags anomalous DH counts (>5 or >10%)
- Verifies outfield positions exist in cardpositions table
- Provides quick smoke test for defensive calculations

Updated CLAUDE.md:
- Added Position Validation section with check_positions.sh usage
- Documented outfield position bug in Common Issues & Solutions
- Included code examples and verification steps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 11:38:36 -06:00
Cal Corum
4e9e8d351d CLAUDE: Add Retrosheet CSV transformer and fix data processing issues
This commit adds support for the new Retrosheet CSV format and resolves
multiple data processing issues in retrosheet_data.py.

New Features:
- Created retrosheet_transformer.py with smart caching system
  - Transforms new Retrosheet CSV format to legacy format
  - Checks file timestamps to avoid redundant transformations
  - Caches normalized data for instant subsequent loads (~5s → <1s)
  - Handles column mapping: gid→game_id, bathand→batter_hand, etc.
  - Derives event_type from multiple boolean columns
  - Converts handedness values R/L → r/l
  - Explicitly sets string dtypes for hit_val, hit_location, batted_ball_type

Configuration Updates:
- Updated retrosheet_data.py for 2005 season data
  - START_DATE: 19980301 → 20050403 (2005 Opening Day)
  - END_DATE: 19980430 → 20051002 (2005 Regular Season End)
  - SEASON_PCT: 28/162 → 162/162 (full season)
  - MIN_PA_VL/VR: 20/40 → 50/75 (full season minimums)
  - CARDSET_ID: Updated for 2005 cardsets
  - EVENTS_FILENAME: Updated to use retrosheets_events_2005.csv

Bug Fixes:
1. Multi-team player duplicates
   - Players traded during season had duplicate rows (one per team + combined)
   - Added filtering to keep only combined totals (2TM, 3TM, etc.)
   - Prevents duplicate key_bbref values in ratings dataframes

2. Column name conflicts
   - Fixed Tm column conflict when merging periph_stats and defense_p
   - Drop duplicate Tm from defense data before merge

3. Pitcher rating calculations (pitchers/calcs_pitcher.py)
   - Fixed "truth value is ambiguous" error in min() comparisons
   - Explicitly convert pandas values to float before min() operations

4. Dictionary column corruption in ratings
   - Fixed ratings_vL and ratings_vR corruption during DataFrame merges
   - Only merge specific columns (key_bbref, player_id, card_id) instead of full DataFrame
   - Removed unnecessary .set_index() calls from post_batting_cards() and post_pitching_cards()

Documentation:
- Updated CLAUDE.md with comprehensive troubleshooting section
- Added Retrosheet transformation documentation
- Documented defense CSV requirements and column naming
- Added configuration checklist for retrosheet_data.py
- Documented common issues: multi-team players, dictionary corruption, string types

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 16:11:52 -06:00
Cal Corum
c89e1eb507 Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
Cal Corum
3969bf008f December 22 Update 2024-12-22 15:46:52 -06:00
Cal Corum
9844fa4742 Add player update functionality
Save new players and deltas to csv
2024-11-10 14:42:00 -06:00
Cal Corum
d7922a138c Green to go for 98 Live Series 2024-11-02 22:51:24 -05:00
Cal Corum
d69d7e6103 Added exceptions.py, added date_math, error checks for promos 2024-11-02 19:00:39 -05:00
Cal Corum
cdb5820dbc Pitchers are complete 2024-11-01 08:50:29 -05:00
Cal Corum
93b8a230db All pitcher data is built, ready to post data 2024-10-27 23:41:44 -05:00
Cal Corum
e396b50230 Pitching defense done
Pitching cards done
2024-10-27 00:42:51 -05:00
Cal Corum
3388c4e0c5 Pitching peripherals done 2024-10-26 20:18:54 -05:00
Cal Corum
d74ea59d40 Huge progress on pitching_stats 2024-10-25 15:36:47 -05:00
Cal Corum
b3102201c8 Added Devil Rays to club and franchise lists
Fixed bphr fraction bug
Removed player post limit
2024-10-25 12:24:08 -05:00
Cal Corum
b3cce68576 Added ratings post and positions post - to be tested 2024-10-25 07:48:56 -05:00
Cal Corum
5c6d706160 Player post complete
Batting card post complete
2024-10-25 07:06:06 -05:00
Cal Corum
44e8e22bc0 Add defense calcs
Begin work on posting data
2024-10-20 22:57:45 -05:00
Cal Corum
d8e30ec5f9 Batting cards and ratings being calculated; began positions 2024-10-19 23:02:32 -05:00
Cal Corum
6e576e22dc Prep 1998 running & pitching stats 2024-10-19 01:17:39 -05:00
Cal Corum
c7373e7d9d Batter stat generation complete 2024-10-19 01:05:23 -05:00
Cal Corum
d092bdb9ff Batter stats nearing completion 2024-10-18 23:31:39 -05:00
Cal Corum
0de2239100 Updated mround to return float
Counting stats nearly complete for batters
2024-10-18 12:12:40 -05:00
Cal Corum
1109a12434 Added PA and AB to batter_stats 2024-10-17 16:31:17 -05:00
Cal Corum
07faea0bc7 Retrosheet pulling to stat dataframe 2024-10-17 12:06:05 -05:00