Regenerated scouting CSVs for all cardsets (6211 batters, 7070 pitchers)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --create/-c flag to create new players directly from YAML profiles
- Skip MLBPlayer creation (not needed for custom players)
- Auto-populate required API fields (cost, rarity, mlbclub, etc.)
- Update YAML profile with player_id and card_id after creation
- Add Adm Ball Traits custom player profile
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --last-week-ratio, --last-twoweeks-ratio, --last-month-ratio flags
- Auto-enable 0.2 recency bias for last 2 weeks on Live series after May 30
- Fix main() call to pass empty args list (legacy parameter required)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Migrated all major card creation workflows to pd-cards CLI:
live-series:
- update: Full FanGraphs/BBRef card generation with CLI options
- status: Show cardset status from database
retrosheet:
- process: Historical Retrosheet data processing
- arms: Generate outfield arm ratings from play-by-play
- validate: Check for position anomalies in cardsets
- defense: Fetch defensive stats from Baseball Reference
scouting:
- batters: Generate batting scouting reports
- pitchers: Generate pitching scouting reports
- all: Generate all reports at once
upload:
- s3: Upload card images to AWS S3
- check: Validate cards without uploading
- refresh: Re-generate and re-upload card images
Updated CLAUDE.md with comprehensive CLI documentation.
Legacy scripts remain available but CLI is now the primary interface.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add pitcher detection via player_type field or ratings schema
- Separate calc_ops and verify_total for batters vs pitchers
- Pitcher template with correct schema (double_cf, xcheck fields, etc.)
- Combined OPS formula: max() for pitchers, min() for batters
- Add --type option to 'pd-cards custom new' command
- Migrate Tony Smehrik to YAML pitcher profile
Pitcher schema differences from batters:
- double_cf instead of double_pull
- flyout_cf_b instead of flyout_a/flyout_bq
- No groundout_c
- xcheck_* fields (29 chances for fielder plays)
- pitching block for starter/relief/closer ratings
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces new pd-cards CLI tool for all card creation workflows:
- custom: manage fictional character cards via YAML profiles
- live-series: live season card updates (stub)
- retrosheet: historical data processing (stub)
- scouting: scouting report generation (stub)
- upload: S3 card image upload (stub)
Key features:
- Typer-based CLI with auto-generated help and shell completion
- YAML profiles for custom characters (replaces per-character Python scripts)
- Preview, submit, new, and list commands for custom cards
- First character migrated: Kalin Young
Install with: uv pip install -e .
Run with: pd-cards --help
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Regenerated scouting CSVs with May PotM players (13287-13294)
- Reset retrosheet_data.py from May PotM back to Live series config
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Changed card URL generation to fetch from PD API endpoint
(/v2/players/{id}/battingcard) instead of existing S3 URL
- This ensures database changes (like cardpositions) are reflected
in regenerated card images
- Added fix_cardpositions.py utility for regenerating batter positions
without re-running full retrosheet_data.py script
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Increased target OPS from 0.820 to 0.850 with adjusted stat splits:
- vs RHP: .260/.340/.495 (power profile)
- vs LHP: .260/.375/.420 (patient/OBP profile)
- Cost updated from 85 to 188
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Regenerate scouting CSVs with latest player ratings
- Update archetype calculator with BP-HR whole number rule
- Refresh retrosheet normalized data
- Minor script updates for Kalin Young card creation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
CRITICAL BUG FIX: Removed code that was appending asterisks to left-handed
players' names and hash symbols to switch hitters' names in production.
## Changes
### Core Fix (retrosheet_data.py)
- Removed name_suffix code from new_player_payload() (lines 1103-1108)
- Players names now stored cleanly without visual indicators
- Affected 20 left-handed batters in 2005 Live cardset
### New Utility Scripts
- fix_player_names.py: PATCH player names to remove symbols (uses 'name' param)
- check_player_names.py: Verify all players for asterisks/hashes
- regenerate_lefty_cards.py: Update image URLs with cache-busting dates
- upload_lefty_cards_to_s3.py: Fetch fresh cards and upload to S3
### Documentation (CRITICAL - READ BEFORE WORKING WITH CARDS)
- docs/LESSONS_LEARNED_ASTERISK_REGRESSION.md: Comprehensive guide
* API parameter is 'name' NOT 'p_name'
* Card generation caching requires timestamp cache-busting
* S3 keys must not include query parameters
* Player names only in 'players' table
* Never append visual indicators to stored data
- CLAUDE.md: Added critical warnings section at top
## Key Learnings
1. API param for player name is 'name', not 'p_name'
2. Cards are cached - use timestamp in ?d= parameter
3. S3 keys != S3 URLs (no query params in keys)
4. Fix data BEFORE generating/uploading cards
5. Visual indicators belong in UI, not database
## Impact
- Fixed 20 player records in production
- Regenerated and uploaded 20 clean cards to S3
- Documented to prevent future regressions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Creates migrate_all_cards_to_s3.py to migrate historical card images from
Paper Dynasty API to S3 bucket. Key features:
- Processes all cardsets automatically (12,966 player cards across 29 cardsets)
- Detects and skips URLs already pointing to AWS S3
- Dry-run mode for previewing changes before execution
- Flexible filtering by cardset ID ranges or exclusion lists
- Per-cardset and global statistics tracking
- Updates player records with new S3 URLs after upload
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed from range(1, 28) to empty list [] to automatically include
all cardsets without future maintenance. This ensures new cardsets
(like cardset 29) are automatically included in scouting reports.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Two bugs were preventing switch hitters from being correctly identified:
1. Missing handedness indicator in player names
- Player names need special characters appended (* for left, # for switch)
- new_player_payload() now appends '#' for switch hitters
2. Overly strict threshold in get_bat_hand()
- Required 10+ total PAs to classify as switch hitter
- Now correctly identifies ANY player who batted from both sides as 'S'
- Removes arbitrary PA threshold that caused misclassification
Impact: Fixes Jimmie Rollins and Jorge Posada showing as 'R' instead of 'S'
Applies to all switch hitters in retrosheet-based cardsets
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
User requirement: Only 1 player with -6 arm, no more than 3 with -5 arm
2005 tz_runs_total data analysis:
- 23: Jim Edmonds (1 player)
- 21: Carl Crawford (1 player)
- 19: Coco Crisp, Brady Clark, Andruw Jones (3 players)
- 18: Cliff Floyd
- 17: Jason Michaels, Ichiro Suzuki (2 players)
Updated thresholds:
- > 22: -6 arm (Jim Edmonds only)
- > 19: -5 arm (Carl Crawford only, satisfies 'no more than 3')
- > 16: -4 arm (the three 19s plus 18s and 17s)
- Graduated scale for remaining tiers
Result: Elite arm ratings are now truly exceptional and rare
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The arm_outfield function had thresholds designed for bis_runs_outfield,
but retrosheet_data.py uses tz_runs_total (different scale).
Issue: 20 players had -6 arm (top rating) - should be exceptionally rare
Analysis of tz_runs_total distribution:
- Ranges from -8 to +23 (not -10 to +10)
- Old threshold: > 8 gave 20 players with -6 arm
- New threshold: > 18 gives ~2-3 players with -6 arm
Updated thresholds to properly map tz_runs_total values to arm ratings:
- > 18: -6 (exceptional, top 2-3 players like Andruw Jones)
- > 14: -5 (elite arms, ~5-8 players)
- > 10: -4 (very good)
- Graduated scale down to +2 for very poor arms
Result: -6 arms now truly exceptional, proper distribution across ratings
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The post_positions function was being called twice (batters then pitchers).
Each call deleted ALL cardpositions, so the second call would delete the
batter positions that were just created.
Solution: Added delete_existing parameter (default False). Only the first
call (batters) sets delete_existing=True to clean up old data. The second
call (pitchers) just appends positions without deletion.
Result: Both batter and pitcher positions now persist correctly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The deletion logic was failing with 'name db_delete is not defined' because
the function wasn't imported from db_calls.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Root cause: post_positions() was upserting cardpositions, leaving stale DH
entries from the previous buggy run where outfielders had no defensive
positions.
Solution: Modified post_positions() to DELETE all existing cardpositions for
the cardset before posting new ones. This ensures:
- Stale DH positions are removed when players gain defensive positions
- Cards show only current, accurate positions
- No phantom positions persist across script runs
Example: Ichiro previously had both "RF" and "DH" cardpositions. With this
fix, only "RF" remains after re-running the script.
Updated CLAUDE.md with explanation of the cleanup logic.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed critical bug where all outfielders were incorrectly assigned as DH
due to defense CSV column mismatch in retrosheet_data.py:
- Lines 889, 926: Changed column check from 'in row' to 'in pos_df.columns'
to correctly detect bis_runs_total availability
- Line 947: Fixed fallback from non-existent 'tz_runs_outfield' to
'tz_runs_total' which actually exists in Baseball Reference CSVs
Impact:
- Before: 57 DH players, 0 outfield positions
- After: 3 DH players, 62 outfielders (23 RF, 20 CF, 19 LF)
Added scripts/check_positions.sh:
- Validates position distribution after card generation
- Flags anomalous DH counts (>5 or >10%)
- Verifies outfield positions exist in cardpositions table
- Provides quick smoke test for defensive calculations
Updated CLAUDE.md:
- Added Position Validation section with check_positions.sh usage
- Documented outfield position bug in Common Issues & Solutions
- Included code examples and verification steps
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add check_cards_and_upload.py: Fetches card images from API and uploads to AWS S3
- Uses persistent aiohttp session for efficient connection reuse
- Supports cache-busting query parameters (?d=date) for Discord compatibility
- S3 URL structure: cards/cardset-{id:03d}/player-{id}/{type}card.png
- Configurable upload and player URL update flags
- Add analyze_cardset_rarity.py: Analyzes players by franchise and rarity
- Groups batters, pitchers, and combined totals
- Displays counts for all rarity tiers by franchise
- Provides comprehensive breakdown of cardset composition
- Add rank_pitching_staffs.py: Ranks teams 1-30 by pitching staff quality
- Point system based on rarity tiers (HoF=5, MVP=4, AS=3, etc.)
- Shows detailed rosters for top 5 and bottom 5 teams
- Useful for balance analysis and cardset evaluation
- Update CLAUDE.md with new scripts documentation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Added one-time utility scripts used to prepare 2005 defense CSV files
for compatibility with retrosheet_data.py.
Scripts:
- rename_defense_columns.py: Renamed initial batch of defense columns
- RF/9 → range_factor_per_nine
- RF/G → range_factor_per_game
- DP → DP_def, E → E_def, Ch → chances, Inn → Inn_def
- CS% → caught_stealing_perc, PO → pickoffs
- Name-additional → key_bbref
- rename_additional_defense_columns.py: Second batch of column renames
- Fld% → fielding_perc
- Rtot → tz_runs_total, Rtot/yr → tz_runs_total_per_season
- Rtz → tz_runs_field, Rdp → tz_runs_infield
- undo_po_rename.py: Reverted PO → pickoffs for position players
- Kept 'pickoffs' for defense_p.csv (pitchers)
- Changed back to 'PO' for all other positions (c, 1b, 2b, etc.)
- test_retrosheet_integration.py: Integration test for retrosheet_transformer
- Validates batting and pitching stats loading
- Tests date range filtering
- Verifies player counts
These scripts have already been executed and the defense files are
properly formatted. Kept for historical reference and documentation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds support for the new Retrosheet CSV format and resolves
multiple data processing issues in retrosheet_data.py.
New Features:
- Created retrosheet_transformer.py with smart caching system
- Transforms new Retrosheet CSV format to legacy format
- Checks file timestamps to avoid redundant transformations
- Caches normalized data for instant subsequent loads (~5s → <1s)
- Handles column mapping: gid→game_id, bathand→batter_hand, etc.
- Derives event_type from multiple boolean columns
- Converts handedness values R/L → r/l
- Explicitly sets string dtypes for hit_val, hit_location, batted_ball_type
Configuration Updates:
- Updated retrosheet_data.py for 2005 season data
- START_DATE: 19980301 → 20050403 (2005 Opening Day)
- END_DATE: 19980430 → 20051002 (2005 Regular Season End)
- SEASON_PCT: 28/162 → 162/162 (full season)
- MIN_PA_VL/VR: 20/40 → 50/75 (full season minimums)
- CARDSET_ID: Updated for 2005 cardsets
- EVENTS_FILENAME: Updated to use retrosheets_events_2005.csv
Bug Fixes:
1. Multi-team player duplicates
- Players traded during season had duplicate rows (one per team + combined)
- Added filtering to keep only combined totals (2TM, 3TM, etc.)
- Prevents duplicate key_bbref values in ratings dataframes
2. Column name conflicts
- Fixed Tm column conflict when merging periph_stats and defense_p
- Drop duplicate Tm from defense data before merge
3. Pitcher rating calculations (pitchers/calcs_pitcher.py)
- Fixed "truth value is ambiguous" error in min() comparisons
- Explicitly convert pandas values to float before min() operations
4. Dictionary column corruption in ratings
- Fixed ratings_vL and ratings_vR corruption during DataFrame merges
- Only merge specific columns (key_bbref, player_id, card_id) instead of full DataFrame
- Removed unnecessary .set_index() calls from post_batting_cards() and post_pitching_cards()
Documentation:
- Updated CLAUDE.md with comprehensive troubleshooting section
- Added Retrosheet transformation documentation
- Documented defense CSV requirements and column naming
- Added configuration checklist for retrosheet_data.py
- Documented common issues: multi-team players, dictionary corruption, string types
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds default OPS value constants and type hints to key functions,
improving code documentation and IDE support.
## Changes Made
1. **Add default OPS constants** (creation_helpers.py)
- DEFAULT_BATTER_OPS: Default OPS by rarity (1-5)
- DEFAULT_STARTER_OPS: Default OPS-against for starters (99, 1-5)
- DEFAULT_RELIEVER_OPS: Default OPS-against for relievers (99, 1-5)
- Comprehensive comments explaining usage
- Single source of truth for fallback values
2. **Update batters/creation.py**
- Import DEFAULT_BATTER_OPS
- Replace 6 hardcoded if-checks with clean loop over constants
- Add type hints to post_player_updates function
- Import Dict from typing
3. **Update pitchers/creation.py**
- Import DEFAULT_STARTER_OPS and DEFAULT_RELIEVER_OPS
- Replace 12 hardcoded if-checks with clean loops over constants
- Add type hints to post_player_updates function
- Import Dict from typing
4. **Add typing import** (creation_helpers.py)
- Import Dict, List, Tuple, Optional for type hints
- Enables type hints throughout helper functions
## Impact
### Before
```python
# Scattered hardcoded values (batters)
if 1 not in average_ops:
average_ops[1] = 1.066
if 2 not in average_ops:
average_ops[2] = 0.938
# ... 4 more if-checks
# Scattered hardcoded values (pitchers)
if 99 not in sp_average_ops:
sp_average_ops[99] = 0.388
# ... 5 more if-checks for starters
# ... 6 more if-checks for relievers
```
### After
```python
# Clean, data-driven approach (batters)
for rarity, default_ops in DEFAULT_BATTER_OPS.items():
if rarity not in average_ops:
average_ops[rarity] = default_ops
# Clean, data-driven approach (pitchers)
for rarity, default_ops in DEFAULT_STARTER_OPS.items():
if rarity not in sp_average_ops:
sp_average_ops[rarity] = default_ops
for rarity, default_ops in DEFAULT_RELIEVER_OPS.items():
if rarity not in rp_average_ops:
rp_average_ops[rarity] = default_ops
```
### Benefits
✅ Eliminates 18 if-checks across batters and pitchers
✅ Single source of truth for default OPS values
✅ Easy to modify values (change constant, not scattered code)
✅ Self-documenting with clear constant names and comments
✅ Type hints improve IDE support and catch errors early
✅ Function signatures now document expected types
✅ Consistent with other recent refactorings
## Test Results
✅ 42/42 tests pass
✅ All existing functionality preserved
✅ 100% backward compatible
## Files Modified
- creation_helpers.py: +35 lines (3 constants + typing import)
- batters/creation.py: -4 lines net (cleaner code + type hints)
- pitchers/creation.py: -8 lines net (cleaner code + type hints)
**Net change:** More constants, less scattered magic numbers, better types.
Part of ongoing refactoring to reduce code fragility.