paper-dynasty-card-creation/RARITY_BUG_FIX_SUMMARY.md
2025-11-08 16:57:35 -06:00

5.3 KiB

Rarity Assignment Bug Fix Summary

Problems Identified

1. Excessive Hall of Fame Cards (23% instead of ~5%)

Root Cause: Pitcher OPS-against thresholds were too lenient for 2025 season data

  • Starters: 34.7% qualified for HOF (threshold: ≤ 0.4)
  • Relievers: 17.4% qualified for HOF (threshold: ≤ 0.325)
  • Example: Joe Boyle (OPS: 0.365) was HOF → should be Gold

2. Players Without Ratings Stuck as Hall of Fame

Root Cause: Multi-part issue

  1. New players created with placeholder rarity_id: 99
  2. Some players had batting/pitching cards but no ratings in database
  3. Inner join in post_player_updates() excluded players without ratings
  4. They kept HOF (99) instead of being downgraded

Database Status:

  • Production: All 429 batters and 532 pitchers have ratings (no issue)
  • Dev: ⚠️ 126 of 583 batters missing ratings (21.6%)

Solutions Implemented

Fix 1: Season-Aware Rarity Thresholds

New File: rarity_thresholds.py

  • Dataclass-based threshold configuration
  • Separate thresholds for 2024 vs 2025 seasons
  • Type-safe rarity assignment methods

2025 Pitcher Thresholds (based on percentile analysis):

Rarity Starters (was → now) Relievers (was → now)
HOF ≤ 0.400 → 0.300 ≤ 0.325 → 0.270
Diamond ≤ 0.475 → 0.354 ≤ 0.400 → 0.319
Gold ≤ 0.530 → 0.384 ≤ 0.475 → 0.370
Silver ≤ 0.600 → 0.441 ≤ 0.550 → 0.436
Bronze ≤ 0.675 → 0.487 ≤ 0.625 → 0.503

Batter Thresholds: Unchanged (need more data to analyze)


Fix 2: Handle Missing Ratings

Modified Files: batters/creation.py, pitchers/creation.py

Changes in post_player_updates():

Before (Inner Join - lost players without ratings):

total_ratings = pd.merge(
    await pd_battingcards_df(cardset['id']),
    await pd_battingcardratings_df(cardset['id']),
    on='battingcard_id'  # Inner join
)

After (LEFT Join - keeps all players):

batting_cards = await pd_battingcards_df(cardset['id'])
batting_ratings = await pd_battingcardratings_df(cardset['id'], season)

total_ratings = pd.merge(
    batting_cards,
    batting_ratings,
    on='battingcard_id',
    how='left'  # Keep all batting cards
)

# Assign default rarity (Common/5) for players without ratings
if 'new_rarity_id' not in total_ratings.columns:
    total_ratings['new_rarity_id'] = 5
elif total_ratings['new_rarity_id'].isna().any():
    total_ratings['new_rarity_id'] = total_ratings['new_rarity_id'].fillna(5)

Added Season Parameter:

  • pd_battingcardratings_df(cardset_id, season) - now calculates rarity using season-aware thresholds
  • pd_pitchingcardratings_df(cardset_id, season, pitching_cards) - calculates rarity with starter/reliever logic
  • post_player_updates(..., season) - passes season to threshold functions

Test Results

Threshold Tests

Joe Boyle (OPS: 0.365):
  2024 Thresholds: HOF (99)
  2025 Thresholds: Gold (2) ✓

Missing Ratings Tests

OLD: 5 players → 1 after merge (lost 4)
NEW: 5 players → 5 after merge (lost 0)
  - Aaron Judge: HOF (99) ✓
  - Others: Common (5) ✓

NaN Handling Tests

Case 1: Column with NaN → fillna(5) ✓
Case 2: Column missing → create with 5 ✓
Case 3: No NaN → unchanged ✓
Case 4: get_player_updates → no ValueError ✓

Files Modified

  1. rarity_thresholds.py (NEW)

    • Dataclass definitions for thresholds
    • Season-aware getter functions
    • 2024 and 2025 threshold constants
  2. batters/creation.py

    • Import get_batter_thresholds
    • Update pd_battingcardratings_df() to accept season parameter
    • Update post_player_updates() to use LEFT JOIN and handle NaN
    • Update run_batters() to pass season parameter
  3. pitchers/creation.py

    • Import get_pitcher_thresholds
    • Update pd_pitchingcardratings_df() to accept season and calculate rarity
    • Update post_player_updates() to use LEFT JOIN and handle NaN
    • Update run_pitchers() to pass season parameter

Migration Notes

For Next Card Creation Run:

  • Will automatically use 2025 thresholds
  • Will handle players without ratings correctly
  • No manual intervention needed

For Dev Database (Optional):

Since dev has 126 players without ratings stuck as HOF, you can either:

Option 1: Re-run card creation (recommended)

  • Will properly create ratings for the 126 players
  • Will assign correct rarities

Option 2: Quick manual fix

UPDATE players
SET rarity_id = 5
WHERE cost = 99999
  AND cardset_id = 24;

For Historical Cardsets (2024 and earlier):

  • Will use original 2024 thresholds
  • Backward compatible
  • No changes to existing cards

Impact Summary

Metric Before After
HOF Cards 267 (23%) ~58 (5%) expected
Joe Boyle HOF (99) Gold (2)
Players w/o ratings HOF (99) Common (5)
Threshold accuracy Static Season-aware

Estimated Rarity Distribution After Fix:

  • Hall of Fame: ~58 cards (5%)
  • Diamond: ~116 cards (10%)
  • Gold: ~174 cards (15%)
  • Silver: ~232 cards (20%)
  • Bronze: ~232 cards (20%)
  • Common: ~349 cards (30%)