Commit Graph

29 Commits

Author SHA1 Message Date
Cal Corum
92256cb29c Update scouting data and card creation scripts
- Regenerate scouting CSVs with latest player ratings
- Update archetype calculator with BP-HR whole number rule
- Refresh retrosheet normalized data
- Minor script updates for Kalin Young card creation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-02 16:25:42 -06:00
Cal Corum
cc5f93eb66 Fix critical asterisk regression in player names
CRITICAL BUG FIX: Removed code that was appending asterisks to left-handed
players' names and hash symbols to switch hitters' names in production.

## Changes

### Core Fix (retrosheet_data.py)
- Removed name_suffix code from new_player_payload() (lines 1103-1108)
- Players names now stored cleanly without visual indicators
- Affected 20 left-handed batters in 2005 Live cardset

### New Utility Scripts
- fix_player_names.py: PATCH player names to remove symbols (uses 'name' param)
- check_player_names.py: Verify all players for asterisks/hashes
- regenerate_lefty_cards.py: Update image URLs with cache-busting dates
- upload_lefty_cards_to_s3.py: Fetch fresh cards and upload to S3

### Documentation (CRITICAL - READ BEFORE WORKING WITH CARDS)
- docs/LESSONS_LEARNED_ASTERISK_REGRESSION.md: Comprehensive guide
  * API parameter is 'name' NOT 'p_name'
  * Card generation caching requires timestamp cache-busting
  * S3 keys must not include query parameters
  * Player names only in 'players' table
  * Never append visual indicators to stored data

- CLAUDE.md: Added critical warnings section at top

## Key Learnings
1. API param for player name is 'name', not 'p_name'
2. Cards are cached - use timestamp in ?d= parameter
3. S3 keys != S3 URLs (no query params in keys)
4. Fix data BEFORE generating/uploading cards
5. Visual indicators belong in UI, not database

## Impact
- Fixed 20 player records in production
- Regenerated and uploaded 20 clean cards to S3
- Documented to prevent future regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 14:38:04 -06:00
Cal Corum
a20348ef7d Fix switch hitter detection for Rollins, Posada, and all switch hitters
Two bugs were preventing switch hitters from being correctly identified:

1. Missing handedness indicator in player names
   - Player names need special characters appended (* for left, # for switch)
   - new_player_payload() now appends '#' for switch hitters

2. Overly strict threshold in get_bat_hand()
   - Required 10+ total PAs to classify as switch hitter
   - Now correctly identifies ANY player who batted from both sides as 'S'
   - Removes arbitrary PA threshold that caused misclassification

Impact: Fixes Jimmie Rollins and Jorge Posada showing as 'R' instead of 'S'
       Applies to all switch hitters in retrosheet-based cardsets

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-11 16:04:33 -06:00
Cal Corum
e4347a0162 CLAUDE: Fix deletion running twice - only delete on first post_positions call
The post_positions function was being called twice (batters then pitchers).
Each call deleted ALL cardpositions, so the second call would delete the
batter positions that were just created.

Solution: Added delete_existing parameter (default False). Only the first
call (batters) sets delete_existing=True to clean up old data. The second
call (pitchers) just appends positions without deletion.

Result: Both batter and pitcher positions now persist correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 12:00:28 -06:00
Cal Corum
6746b51ca6 CLAUDE: Add missing db_delete import for cardposition cleanup
The deletion logic was failing with 'name db_delete is not defined' because
the function wasn't imported from db_calls.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 11:53:39 -06:00
Cal Corum
c23f6d1ada CLAUDE: Fix DH showing on cards for players with defensive positions
Root cause: post_positions() was upserting cardpositions, leaving stale DH
entries from the previous buggy run where outfielders had no defensive
positions.

Solution: Modified post_positions() to DELETE all existing cardpositions for
the cardset before posting new ones. This ensures:
- Stale DH positions are removed when players gain defensive positions
- Cards show only current, accurate positions
- No phantom positions persist across script runs

Example: Ichiro previously had both "RF" and "DH" cardpositions. With this
fix, only "RF" remains after re-running the script.

Updated CLAUDE.md with explanation of the cleanup logic.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 11:48:36 -06:00
Cal Corum
5d7a0dd74b CLAUDE: Fix outfield position assignment bug and add validation script
Fixed critical bug where all outfielders were incorrectly assigned as DH
due to defense CSV column mismatch in retrosheet_data.py:

- Lines 889, 926: Changed column check from 'in row' to 'in pos_df.columns'
  to correctly detect bis_runs_total availability
- Line 947: Fixed fallback from non-existent 'tz_runs_outfield' to
  'tz_runs_total' which actually exists in Baseball Reference CSVs

Impact:
- Before: 57 DH players, 0 outfield positions
- After: 3 DH players, 62 outfielders (23 RF, 20 CF, 19 LF)

Added scripts/check_positions.sh:
- Validates position distribution after card generation
- Flags anomalous DH counts (>5 or >10%)
- Verifies outfield positions exist in cardpositions table
- Provides quick smoke test for defensive calculations

Updated CLAUDE.md:
- Added Position Validation section with check_positions.sh usage
- Documented outfield position bug in Common Issues & Solutions
- Included code examples and verification steps

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 11:38:36 -06:00
Cal Corum
4e9e8d351d CLAUDE: Add Retrosheet CSV transformer and fix data processing issues
This commit adds support for the new Retrosheet CSV format and resolves
multiple data processing issues in retrosheet_data.py.

New Features:
- Created retrosheet_transformer.py with smart caching system
  - Transforms new Retrosheet CSV format to legacy format
  - Checks file timestamps to avoid redundant transformations
  - Caches normalized data for instant subsequent loads (~5s → <1s)
  - Handles column mapping: gid→game_id, bathand→batter_hand, etc.
  - Derives event_type from multiple boolean columns
  - Converts handedness values R/L → r/l
  - Explicitly sets string dtypes for hit_val, hit_location, batted_ball_type

Configuration Updates:
- Updated retrosheet_data.py for 2005 season data
  - START_DATE: 19980301 → 20050403 (2005 Opening Day)
  - END_DATE: 19980430 → 20051002 (2005 Regular Season End)
  - SEASON_PCT: 28/162 → 162/162 (full season)
  - MIN_PA_VL/VR: 20/40 → 50/75 (full season minimums)
  - CARDSET_ID: Updated for 2005 cardsets
  - EVENTS_FILENAME: Updated to use retrosheets_events_2005.csv

Bug Fixes:
1. Multi-team player duplicates
   - Players traded during season had duplicate rows (one per team + combined)
   - Added filtering to keep only combined totals (2TM, 3TM, etc.)
   - Prevents duplicate key_bbref values in ratings dataframes

2. Column name conflicts
   - Fixed Tm column conflict when merging periph_stats and defense_p
   - Drop duplicate Tm from defense data before merge

3. Pitcher rating calculations (pitchers/calcs_pitcher.py)
   - Fixed "truth value is ambiguous" error in min() comparisons
   - Explicitly convert pandas values to float before min() operations

4. Dictionary column corruption in ratings
   - Fixed ratings_vL and ratings_vR corruption during DataFrame merges
   - Only merge specific columns (key_bbref, player_id, card_id) instead of full DataFrame
   - Removed unnecessary .set_index() calls from post_batting_cards() and post_pitching_cards()

Documentation:
- Updated CLAUDE.md with comprehensive troubleshooting section
- Added Retrosheet transformation documentation
- Documented defense CSV requirements and column naming
- Added configuration checklist for retrosheet_data.py
- Documented common issues: multi-team players, dictionary corruption, string types

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 16:11:52 -06:00
Cal Corum
c89e1eb507 Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
Cal Corum
3969bf008f December 22 Update 2024-12-22 15:46:52 -06:00
Cal Corum
9844fa4742 Add player update functionality
Save new players and deltas to csv
2024-11-10 14:42:00 -06:00
Cal Corum
d7922a138c Green to go for 98 Live Series 2024-11-02 22:51:24 -05:00
Cal Corum
d69d7e6103 Added exceptions.py, added date_math, error checks for promos 2024-11-02 19:00:39 -05:00
Cal Corum
cdb5820dbc Pitchers are complete 2024-11-01 08:50:29 -05:00
Cal Corum
93b8a230db All pitcher data is built, ready to post data 2024-10-27 23:41:44 -05:00
Cal Corum
e396b50230 Pitching defense done
Pitching cards done
2024-10-27 00:42:51 -05:00
Cal Corum
3388c4e0c5 Pitching peripherals done 2024-10-26 20:18:54 -05:00
Cal Corum
d74ea59d40 Huge progress on pitching_stats 2024-10-25 15:36:47 -05:00
Cal Corum
b3102201c8 Added Devil Rays to club and franchise lists
Fixed bphr fraction bug
Removed player post limit
2024-10-25 12:24:08 -05:00
Cal Corum
b3cce68576 Added ratings post and positions post - to be tested 2024-10-25 07:48:56 -05:00
Cal Corum
5c6d706160 Player post complete
Batting card post complete
2024-10-25 07:06:06 -05:00
Cal Corum
44e8e22bc0 Add defense calcs
Begin work on posting data
2024-10-20 22:57:45 -05:00
Cal Corum
d8e30ec5f9 Batting cards and ratings being calculated; began positions 2024-10-19 23:02:32 -05:00
Cal Corum
6e576e22dc Prep 1998 running & pitching stats 2024-10-19 01:17:39 -05:00
Cal Corum
c7373e7d9d Batter stat generation complete 2024-10-19 01:05:23 -05:00
Cal Corum
d092bdb9ff Batter stats nearing completion 2024-10-18 23:31:39 -05:00
Cal Corum
0de2239100 Updated mround to return float
Counting stats nearly complete for batters
2024-10-18 12:12:40 -05:00
Cal Corum
1109a12434 Added PA and AB to batter_stats 2024-10-17 16:31:17 -05:00
Cal Corum
07faea0bc7 Retrosheet pulling to stat dataframe 2024-10-17 12:06:05 -05:00