Paper Dynasty Card Creation - Baseball card generation system
This commit adds support for the new Retrosheet CSV format and resolves multiple data processing issues in retrosheet_data.py. New Features: - Created retrosheet_transformer.py with smart caching system - Transforms new Retrosheet CSV format to legacy format - Checks file timestamps to avoid redundant transformations - Caches normalized data for instant subsequent loads (~5s → <1s) - Handles column mapping: gid→game_id, bathand→batter_hand, etc. - Derives event_type from multiple boolean columns - Converts handedness values R/L → r/l - Explicitly sets string dtypes for hit_val, hit_location, batted_ball_type Configuration Updates: - Updated retrosheet_data.py for 2005 season data - START_DATE: 19980301 → 20050403 (2005 Opening Day) - END_DATE: 19980430 → 20051002 (2005 Regular Season End) - SEASON_PCT: 28/162 → 162/162 (full season) - MIN_PA_VL/VR: 20/40 → 50/75 (full season minimums) - CARDSET_ID: Updated for 2005 cardsets - EVENTS_FILENAME: Updated to use retrosheets_events_2005.csv Bug Fixes: 1. Multi-team player duplicates - Players traded during season had duplicate rows (one per team + combined) - Added filtering to keep only combined totals (2TM, 3TM, etc.) - Prevents duplicate key_bbref values in ratings dataframes 2. Column name conflicts - Fixed Tm column conflict when merging periph_stats and defense_p - Drop duplicate Tm from defense data before merge 3. Pitcher rating calculations (pitchers/calcs_pitcher.py) - Fixed "truth value is ambiguous" error in min() comparisons - Explicitly convert pandas values to float before min() operations 4. Dictionary column corruption in ratings - Fixed ratings_vL and ratings_vR corruption during DataFrame merges - Only merge specific columns (key_bbref, player_id, card_id) instead of full DataFrame - Removed unnecessary .set_index() calls from post_batting_cards() and post_pitching_cards() Documentation: - Updated CLAUDE.md with comprehensive troubleshooting section - Added Retrosheet transformation documentation - Documented defense CSV requirements and column naming - Added configuration checklist for retrosheet_data.py - Documented common issues: multi-team players, dictionary corruption, string types 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|---|---|---|
| .claude/plans | ||
| batters | ||
| card-output | ||
| data-input | ||
| defenders | ||
| html work | ||
| logs | ||
| pitchers | ||
| scouting | ||
| scripts | ||
| tests | ||
| .gitignore | ||
| automated_data_fetcher.py | ||
| batter-deltas.csv | ||
| batting_stats.csv | ||
| check_cards.py | ||
| CLAUDE.md | ||
| creation_helpers.py | ||
| db_calls_card_creation.py | ||
| db_calls.py | ||
| exceptions.py | ||
| live_series_update.py | ||
| new-batters.csv | ||
| new-pitchers.csv | ||
| pitcher-deltas.csv | ||
| pitching_stats.csv | ||
| pkmn.json | ||
| post_raw_player_csv.py | ||
| PROMO_CARD_FIX.md | ||
| pull_pitching_stats.py | ||
| pybaseball_doodling.py | ||
| pytest.ini | ||
| README.txt | ||
| REFACTORING_COMPLETE.md | ||
| REFACTORING_SUMMARY.md | ||
| refresh_cards.py | ||
| requirements.txt | ||
| retrosheet_data.py | ||
| retrosheet_transformer.py | ||
| retrosheet.db | ||
| scouting_batters.py | ||
| scouting_pitchers.py | ||
| test_data_fetcher_demo.py | ||
#######
CARD CREATION PROCESS
#######
1) Download stats
FanGraphs / https://www.fangraphs.com/leaders/splits-leaderboards
- Batting
- vL Standard / vlhp-basic.csv
- vL Batted Balls / vlhp-rate.csv
- vR Standard / vrhp-basic.csv
- vR Batted Balls / vrhp-rate.csv
- Pitching
- vL Standard / vlhh-basic.csv
- vL Batted Balls / vlhh-rate.csv
- vR Standard / vrhh-basic.csv
- vR Batted Balls / vrhh-rate.csv
Baseball Reference
- running.csv
- https://www.baseball-reference.com/leagues/majors/2023-baserunning-batting.shtml
- Remove header lines
- pitching.csv
- https://www.baseball-reference.com/leagues/majors/2023-standard-pitching.shtml
2) Run Card Updates (Python Configuration)
3) Check Card Validity (Python Configuration)
#######
OLD DATA REQUIREMENTS
#######
- Add any new players to players.csv for import
- Create directory in /data-input in format `XXXX Season Cardset`
- Upload the following csv files:
- baserunning-data.csv
- https://www.baseball-reference.com/leagues/majors/2023-baserunning-batting.shtml
- Remove header lines
- batter-stats.csv
- https://www.fangraphs.com/leaders/splits-leaderboards
- Remove header lines
- 20 PA vL / 40 PA vR for Live || 50 PA vL / 75 PA vR for legacy seasons
- defense-X.csv (each position)
- https://www.baseball-reference.com/leagues/majors/2023-specialpos_p-fielding.shtml
- replace the `p` in `p-fielding` with 1b/2b/lf
- Column Changes (pre-2013)
- Catchers: add column between Rgood and RsbC
- 1b/2b/3b/ss: add 3 columns between Rgood and Rbnt
- defense-of.csv (don't forget combined OF)
- https://www.baseball-reference.com/leagues/majors/2023-specialpos_of-fielding.shtml
- replace the `p` in `p-fielding` with of
- pitcher-data.csv
- https://www.baseball-reference.com/leagues/majors/2023-standard-pitching.shtml
- pitcher-stats.csv
- https://www.fangraphs.com/leaders/splits-leaderboards
- Remove header lines
- 20 TBF vL / 40 TBF vR for Live || 50 TBF vL / 75 TBF vR for legacy seasons
#######
OLD CARD CREATION PROCESS
#######
1) Import new players for sba_id with `1. Import Players`
2) Confirm cardset exists; if not, create now
3) Create cards with `3. Card Creation`
4) Generate csv output with `4. Card Output`
5) Upload output files into Sheets for Component Studio import
6) Upload ratings output files into Sheets for PD Ratings Guide
7) Import cards into Component Studio
8) Export -> Download All from Component Studio
9) Rename image files to <first>.<last>.png