Commit Graph

145 Commits

Author SHA1 Message Date
Cal Corum
4e9e8d351d CLAUDE: Add Retrosheet CSV transformer and fix data processing issues
This commit adds support for the new Retrosheet CSV format and resolves
multiple data processing issues in retrosheet_data.py.

New Features:
- Created retrosheet_transformer.py with smart caching system
  - Transforms new Retrosheet CSV format to legacy format
  - Checks file timestamps to avoid redundant transformations
  - Caches normalized data for instant subsequent loads (~5s → <1s)
  - Handles column mapping: gid→game_id, bathand→batter_hand, etc.
  - Derives event_type from multiple boolean columns
  - Converts handedness values R/L → r/l
  - Explicitly sets string dtypes for hit_val, hit_location, batted_ball_type

Configuration Updates:
- Updated retrosheet_data.py for 2005 season data
  - START_DATE: 19980301 → 20050403 (2005 Opening Day)
  - END_DATE: 19980430 → 20051002 (2005 Regular Season End)
  - SEASON_PCT: 28/162 → 162/162 (full season)
  - MIN_PA_VL/VR: 20/40 → 50/75 (full season minimums)
  - CARDSET_ID: Updated for 2005 cardsets
  - EVENTS_FILENAME: Updated to use retrosheets_events_2005.csv

Bug Fixes:
1. Multi-team player duplicates
   - Players traded during season had duplicate rows (one per team + combined)
   - Added filtering to keep only combined totals (2TM, 3TM, etc.)
   - Prevents duplicate key_bbref values in ratings dataframes

2. Column name conflicts
   - Fixed Tm column conflict when merging periph_stats and defense_p
   - Drop duplicate Tm from defense data before merge

3. Pitcher rating calculations (pitchers/calcs_pitcher.py)
   - Fixed "truth value is ambiguous" error in min() comparisons
   - Explicitly convert pandas values to float before min() operations

4. Dictionary column corruption in ratings
   - Fixed ratings_vL and ratings_vR corruption during DataFrame merges
   - Only merge specific columns (key_bbref, player_id, card_id) instead of full DataFrame
   - Removed unnecessary .set_index() calls from post_batting_cards() and post_pitching_cards()

Documentation:
- Updated CLAUDE.md with comprehensive troubleshooting section
- Added Retrosheet transformation documentation
- Documented defense CSV requirements and column naming
- Added configuration checklist for retrosheet_data.py
- Documented common issues: multi-team players, dictionary corruption, string types

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-08 16:11:52 -06:00
Cal Corum
a1564015cd CLAUDE: Add comprehensive documentation for refactoring session
This documentation covers all three refactoring commits:
1. Extract business logic & add core constants
2. Extract rarity cost adjustment logic
3. Add default OPS constants & type hints

Includes:
- Executive summary with key achievements
- Detailed breakdown of each commit
- Before/after code comparisons
- Usage guide with examples
- Complete test coverage documentation
- Migration guide (no migration needed - 100% compatible)
- Metrics and statistics
- Future recommendations
- Quick reference guide

Total documentation: 500+ lines covering 85 minutes of refactoring work
that dramatically improved code maintainability.
2025-10-31 23:44:32 -05:00
Cal Corum
db2d81a6d1 CLAUDE: Add default OPS constants and type hints to improve code clarity
This commit adds default OPS value constants and type hints to key functions,
improving code documentation and IDE support.

## Changes Made

1. **Add default OPS constants** (creation_helpers.py)
   - DEFAULT_BATTER_OPS: Default OPS by rarity (1-5)
   - DEFAULT_STARTER_OPS: Default OPS-against for starters (99, 1-5)
   - DEFAULT_RELIEVER_OPS: Default OPS-against for relievers (99, 1-5)
   - Comprehensive comments explaining usage
   - Single source of truth for fallback values

2. **Update batters/creation.py**
   - Import DEFAULT_BATTER_OPS
   - Replace 6 hardcoded if-checks with clean loop over constants
   - Add type hints to post_player_updates function
   - Import Dict from typing

3. **Update pitchers/creation.py**
   - Import DEFAULT_STARTER_OPS and DEFAULT_RELIEVER_OPS
   - Replace 12 hardcoded if-checks with clean loops over constants
   - Add type hints to post_player_updates function
   - Import Dict from typing

4. **Add typing import** (creation_helpers.py)
   - Import Dict, List, Tuple, Optional for type hints
   - Enables type hints throughout helper functions

## Impact

### Before
```python
# Scattered hardcoded values (batters)
if 1 not in average_ops:
    average_ops[1] = 1.066
if 2 not in average_ops:
    average_ops[2] = 0.938
# ... 4 more if-checks

# Scattered hardcoded values (pitchers)
if 99 not in sp_average_ops:
    sp_average_ops[99] = 0.388
# ... 5 more if-checks for starters
# ... 6 more if-checks for relievers
```

### After
```python
# Clean, data-driven approach (batters)
for rarity, default_ops in DEFAULT_BATTER_OPS.items():
    if rarity not in average_ops:
        average_ops[rarity] = default_ops

# Clean, data-driven approach (pitchers)
for rarity, default_ops in DEFAULT_STARTER_OPS.items():
    if rarity not in sp_average_ops:
        sp_average_ops[rarity] = default_ops

for rarity, default_ops in DEFAULT_RELIEVER_OPS.items():
    if rarity not in rp_average_ops:
        rp_average_ops[rarity] = default_ops
```

### Benefits
 Eliminates 18 if-checks across batters and pitchers
 Single source of truth for default OPS values
 Easy to modify values (change constant, not scattered code)
 Self-documenting with clear constant names and comments
 Type hints improve IDE support and catch errors early
 Function signatures now document expected types
 Consistent with other recent refactorings

## Test Results
 42/42 tests pass
 All existing functionality preserved
 100% backward compatible

## Files Modified
- creation_helpers.py: +35 lines (3 constants + typing import)
- batters/creation.py: -4 lines net (cleaner code + type hints)
- pitchers/creation.py: -8 lines net (cleaner code + type hints)

**Net change:** More constants, less scattered magic numbers, better types.

Part of ongoing refactoring to reduce code fragility.
2025-10-31 23:28:49 -05:00
Cal Corum
cb471d8057 CLAUDE: Extract rarity cost adjustment logic into data-driven function
This commit eliminates 150+ lines of duplicated, error-prone nested if/elif
logic by extracting rarity cost calculations into a lookup table and function.

## Changes Made

1. **Add RARITY_COST_ADJUSTMENTS lookup table** (creation_helpers.py)
   - Maps (old_rarity, new_rarity) → (cost_adjustment, minimum_cost)
   - Covers all 30 possible rarity transitions
   - Self-documenting with comments for each rarity tier
   - Single source of truth for all cost adjustments

2. **Add calculate_rarity_cost_adjustment() function** (creation_helpers.py)
   - Takes old_rarity, new_rarity, old_cost
   - Returns new cost with adjustments and minimums applied
   - Includes comprehensive docstring with examples
   - Handles edge cases (same rarity, undefined transitions)
   - Logs warnings for undefined transitions

3. **Update batters/creation.py**
   - Import calculate_rarity_cost_adjustment
   - Replace 75-line nested if/elif block with 7-line function call
   - Identical behavior, much cleaner code

4. **Update pitchers/creation.py**
   - Import calculate_rarity_cost_adjustment
   - Replace 75-line nested if/elif block with 7-line function call
   - Eliminates duplication between batters and pitchers

5. **Add comprehensive tests** (tests/test_rarity_cost_adjustments.py)
   - 22 tests covering all scenarios
   - Tests individual transitions (Diamond→Gold, Common→Bronze, etc.)
   - Tests all upward and downward transitions
   - Tests minimum cost enforcement
   - Tests edge cases (zero cost, very high cost, negative cost)
   - Tests symmetry (up then down returns close to original)

## Impact

### Lines Eliminated
- **Batters:** 75 lines → 7 lines (89% reduction)
- **Pitchers:** 75 lines → 7 lines (89% reduction)
- **Total:** 150 lines of nested logic eliminated

### Benefits
 Eliminates 150+ lines of duplicated code
 Data-driven approach makes adjustments clear and modifiable
 Single source of truth prevents inconsistencies
 Independently testable business logic
 22 comprehensive tests ensure correctness
 Easy to add new rarity tiers or modify costs
 Reduced risk of typos in magic numbers

## Test Results
 22/22 new tests pass
 All existing tests still pass
 100% backward compatible - identical behavior

## Files Modified
- creation_helpers.py: +104 lines (table + function + docs)
- batters/creation.py: -68 lines (replaced nested logic)
- pitchers/creation.py: -68 lines (replaced nested logic)
- tests/test_rarity_cost_adjustments.py: +174 lines (new tests)

**Net change:** 150+ lines of complex logic replaced with simple,
tested, data-driven approach.

Part of ongoing refactoring to reduce code fragility.
2025-10-31 22:49:35 -05:00
Cal Corum
bd1cc7e90b CLAUDE: Refactor to reduce code fragility - extract business logic and add constants
This commit implements high value-to-time ratio improvements to make the
codebase more maintainable and less fragile:

## Changes Made

1. **Add constants for magic numbers** (creation_helpers.py)
   - NEW_PLAYER_COST = 99999 (replaces hardcoded sentinel value)
   - RARITY_BASE_COSTS dict (replaces duplicate cost dictionaries)
   - Benefits: Self-documenting, single source of truth, easy to update

2. **Extract business logic into testable function** (creation_helpers.py)
   - Added should_update_player_description() with full docstring
   - Consolidates duplicated logic from batters and pitchers modules
   - Independently testable, clear decision logic with examples
   - Benefits: DRY principle, better testing, easier to modify

3. **Add debug logging for description updates** (batters & pitchers)
   - Logs when descriptions ARE updated (with details)
   - Logs when descriptions are SKIPPED (with reason)
   - Benefits: Easy troubleshooting, visibility into decisions

4. **Update batters/creation.py and pitchers/creation.py**
   - Replace hardcoded 99999 with NEW_PLAYER_COST
   - Replace base_costs dict with RARITY_BASE_COSTS
   - Replace inline logic with should_update_player_description()
   - Improved docstring for post_player_updates()
   - Benefits: Cleaner, more maintainable code

5. **Add comprehensive tests** (tests/test_promo_description_protection.py)
   - 6 new direct unit tests for should_update_player_description()
   - Tests cover: promo/regular cardsets, new/existing players, PotM cards
   - Case-insensitive detection tests
   - Benefits: Confidence in behavior, prevent regressions

6. **Add documentation** (PROMO_CARD_FIX.md, REFACTORING_SUMMARY.md)
   - PROMO_CARD_FIX.md: Details the promo card renaming fix
   - REFACTORING_SUMMARY.md: Comprehensive refactoring documentation
   - Benefits: Future developers understand the code and changes

## Test Results
 13/13 tests pass (7 existing + 6 new)
 No regressions in existing tests
 100% backward compatible

## Impact
- Magic numbers: 100% eliminated
- Duplicated logic: 50% reduction (2 files → 1 function)
- Test coverage: +86% (7 → 13 tests)
- Code clarity: Significantly improved
- Maintainability: Much easier to modify and debug

## Files Modified
- creation_helpers.py: +82 lines (constants, function, docs)
- batters/creation.py: Simplified using new constants/function
- pitchers/creation.py: Simplified using new constants/function
- tests/test_promo_description_protection.py: +66 lines (new tests)
- PROMO_CARD_FIX.md: New documentation
- REFACTORING_SUMMARY.md: New documentation

Total: ~228 lines added/modified for significant maintainability gain

Related to earlier promo card description protection fix.
2025-10-31 22:03:22 -05:00
Cal Corum
c89e1eb507 Claude introduction & Live Series Update 2025-07-22 09:24:34 -05:00
Cal Corum
8939b8bd71 Scouting complete :) 2025-02-09 07:57:02 -06:00
Cal Corum
71b2fdbeba Refactor refresh_cards 2025-02-09 01:17:58 -06:00
Cal Corum
aff600d306 Build scouting csvs locally for upload 2025-02-09 01:17:02 -06:00
Cal Corum
f036f29488 1996 Data 2024-12-23 09:57:51 -06:00
Cal Corum
3969bf008f December 22 Update 2024-12-22 15:46:52 -06:00
Cal Corum
25d4d9a63c Migrate to rotating file logger 2024-11-10 14:42:12 -06:00
Cal Corum
9844fa4742 Add player update functionality
Save new players and deltas to csv
2024-11-10 14:42:00 -06:00
Cal Corum
d7922a138c Green to go for 98 Live Series 2024-11-02 22:51:24 -05:00
Cal Corum
ac544965ae Migrated args to constants 2024-11-02 22:50:54 -05:00
Cal Corum
863d906657 Script to change pos_1=P to proper SP/RP/CP 2024-11-02 22:50:39 -05:00
Cal Corum
d69d7e6103 Added exceptions.py, added date_math, error checks for promos 2024-11-02 19:00:39 -05:00
Cal Corum
cdb5820dbc Pitchers are complete 2024-11-01 08:50:29 -05:00
Cal Corum
93b8a230db All pitcher data is built, ready to post data 2024-10-27 23:41:44 -05:00
Cal Corum
e396b50230 Pitching defense done
Pitching cards done
2024-10-27 00:42:51 -05:00
Cal Corum
3388c4e0c5 Pitching peripherals done 2024-10-26 20:18:54 -05:00
Cal Corum
d74ea59d40 Huge progress on pitching_stats 2024-10-25 15:36:47 -05:00
Cal Corum
b3102201c8 Added Devil Rays to club and franchise lists
Fixed bphr fraction bug
Removed player post limit
2024-10-25 12:24:08 -05:00
Cal Corum
b3cce68576 Added ratings post and positions post - to be tested 2024-10-25 07:48:56 -05:00
Cal Corum
5c6d706160 Player post complete
Batting card post complete
2024-10-25 07:06:06 -05:00
Cal Corum
44e8e22bc0 Add defense calcs
Begin work on posting data
2024-10-20 22:57:45 -05:00
Cal Corum
eb79430de7 Calc rate stats on batting ratings for cost calcs 2024-10-20 22:56:58 -05:00
Cal Corum
2ef68915d1 Add Montreal to franchise list
Fix precision bug in mround
2024-10-20 22:55:59 -05:00
Cal Corum
d8e30ec5f9 Batting cards and ratings being calculated; began positions 2024-10-19 23:02:32 -05:00
Cal Corum
c2b0d93a02 Storing defense data to avoid bbref limits 2024-10-19 23:00:40 -05:00
Cal Corum
6e576e22dc Prep 1998 running & pitching stats 2024-10-19 01:17:39 -05:00
Cal Corum
c7373e7d9d Batter stat generation complete 2024-10-19 01:05:23 -05:00
Cal Corum
3c421c8c90 Update gitignore 2024-10-18 23:40:19 -05:00
Cal Corum
c40451c27b Merge branch 'main' of https://github.com/calcorum/paper-dynasty-card-creation 2024-10-18 23:38:38 -05:00
Cal Corum
d092bdb9ff Batter stats nearing completion 2024-10-18 23:31:39 -05:00
Cal Corum
11ce81dc2b Update gitignore 2024-10-18 18:12:53 -05:00
Cal Corum
0de2239100 Updated mround to return float
Counting stats nearly complete for batters
2024-10-18 12:12:40 -05:00
Cal Corum
1109a12434 Added PA and AB to batter_stats 2024-10-17 16:31:17 -05:00
Cal Corum
07faea0bc7 Retrosheet pulling to stat dataframe 2024-10-17 12:06:05 -05:00
Cal Corum
2a7beef2d9 Add retrosheet db 2024-10-17 09:28:59 -05:00
Cal Corum
639e032586 Moving older scripts into holding cell 2024-10-17 09:28:02 -05:00
Cal Corum
f0f77ffb16 End of season card data plus handedness bugfix 2024-10-16 22:35:35 -05:00
Cal Corum
371c083dc3 August 25 Card Data 2024-08-25 19:49:53 -05:00
Cal Corum
4624c307ef New script to update positions 2024-08-25 18:29:47 -05:00
Cal Corum
ebfe9ec958 August 18 Card Data 2024-08-18 13:55:38 -05:00
Cal Corum
be1ce784ec August 11 Card Data 2024-08-11 20:08:20 -05:00
Cal Corum
a91f0bb906 August 5 Card Data 2024-08-11 15:13:42 -05:00
Cal Corum
e9621a3953 July 21 Card Data 2024-07-21 18:33:45 -05:00
Cal Corum
48c9cec364 Refactor for manual updates 2024-07-14 13:22:22 -05:00
Cal Corum
bf35bca7d8 July 07 Card Data 2024-07-14 13:21:49 -05:00