5.3 KiB
5.3 KiB
Rarity Assignment Bug Fix Summary
Problems Identified
1. Excessive Hall of Fame Cards (23% instead of ~5%)
Root Cause: Pitcher OPS-against thresholds were too lenient for 2025 season data
- Starters: 34.7% qualified for HOF (threshold: ≤ 0.4)
- Relievers: 17.4% qualified for HOF (threshold: ≤ 0.325)
- Example: Joe Boyle (OPS: 0.365) was HOF → should be Gold
2. Players Without Ratings Stuck as Hall of Fame
Root Cause: Multi-part issue
- New players created with placeholder
rarity_id: 99 - Some players had batting/pitching cards but no ratings in database
- Inner join in
post_player_updates()excluded players without ratings - They kept HOF (99) instead of being downgraded
Database Status:
- Production: ✅ All 429 batters and 532 pitchers have ratings (no issue)
- Dev: ⚠️ 126 of 583 batters missing ratings (21.6%)
Solutions Implemented
Fix 1: Season-Aware Rarity Thresholds
New File: rarity_thresholds.py
- Dataclass-based threshold configuration
- Separate thresholds for 2024 vs 2025 seasons
- Type-safe rarity assignment methods
2025 Pitcher Thresholds (based on percentile analysis):
| Rarity | Starters (was → now) | Relievers (was → now) |
|---|---|---|
| HOF | ≤ 0.400 → 0.300 | ≤ 0.325 → 0.270 |
| Diamond | ≤ 0.475 → 0.354 | ≤ 0.400 → 0.319 |
| Gold | ≤ 0.530 → 0.384 | ≤ 0.475 → 0.370 |
| Silver | ≤ 0.600 → 0.441 | ≤ 0.550 → 0.436 |
| Bronze | ≤ 0.675 → 0.487 | ≤ 0.625 → 0.503 |
Batter Thresholds: Unchanged (need more data to analyze)
Fix 2: Handle Missing Ratings
Modified Files: batters/creation.py, pitchers/creation.py
Changes in post_player_updates():
Before (Inner Join - lost players without ratings):
total_ratings = pd.merge(
await pd_battingcards_df(cardset['id']),
await pd_battingcardratings_df(cardset['id']),
on='battingcard_id' # Inner join
)
After (LEFT Join - keeps all players):
batting_cards = await pd_battingcards_df(cardset['id'])
batting_ratings = await pd_battingcardratings_df(cardset['id'], season)
total_ratings = pd.merge(
batting_cards,
batting_ratings,
on='battingcard_id',
how='left' # Keep all batting cards
)
# Assign default rarity (Common/5) for players without ratings
if 'new_rarity_id' not in total_ratings.columns:
total_ratings['new_rarity_id'] = 5
elif total_ratings['new_rarity_id'].isna().any():
total_ratings['new_rarity_id'] = total_ratings['new_rarity_id'].fillna(5)
Added Season Parameter:
pd_battingcardratings_df(cardset_id, season)- now calculates rarity using season-aware thresholdspd_pitchingcardratings_df(cardset_id, season, pitching_cards)- calculates rarity with starter/reliever logicpost_player_updates(..., season)- passes season to threshold functions
Test Results
✅ Threshold Tests
Joe Boyle (OPS: 0.365):
2024 Thresholds: HOF (99)
2025 Thresholds: Gold (2) ✓
✅ Missing Ratings Tests
OLD: 5 players → 1 after merge (lost 4)
NEW: 5 players → 5 after merge (lost 0)
- Aaron Judge: HOF (99) ✓
- Others: Common (5) ✓
✅ NaN Handling Tests
Case 1: Column with NaN → fillna(5) ✓
Case 2: Column missing → create with 5 ✓
Case 3: No NaN → unchanged ✓
Case 4: get_player_updates → no ValueError ✓
Files Modified
-
rarity_thresholds.py (NEW)
- Dataclass definitions for thresholds
- Season-aware getter functions
- 2024 and 2025 threshold constants
-
batters/creation.py
- Import
get_batter_thresholds - Update
pd_battingcardratings_df()to accept season parameter - Update
post_player_updates()to use LEFT JOIN and handle NaN - Update
run_batters()to pass season parameter
- Import
-
pitchers/creation.py
- Import
get_pitcher_thresholds - Update
pd_pitchingcardratings_df()to accept season and calculate rarity - Update
post_player_updates()to use LEFT JOIN and handle NaN - Update
run_pitchers()to pass season parameter
- Import
Migration Notes
For Next Card Creation Run:
- ✅ Will automatically use 2025 thresholds
- ✅ Will handle players without ratings correctly
- ✅ No manual intervention needed
For Dev Database (Optional):
Since dev has 126 players without ratings stuck as HOF, you can either:
Option 1: Re-run card creation (recommended)
- Will properly create ratings for the 126 players
- Will assign correct rarities
Option 2: Quick manual fix
UPDATE players
SET rarity_id = 5
WHERE cost = 99999
AND cardset_id = 24;
For Historical Cardsets (2024 and earlier):
- ✅ Will use original 2024 thresholds
- ✅ Backward compatible
- ✅ No changes to existing cards
Impact Summary
| Metric | Before | After |
|---|---|---|
| HOF Cards | 267 (23%) | ~58 (5%) expected |
| Joe Boyle | HOF (99) | Gold (2) |
| Players w/o ratings | HOF (99) | Common (5) |
| Threshold accuracy | Static | Season-aware |
Estimated Rarity Distribution After Fix:
- Hall of Fame: ~58 cards (5%)
- Diamond: ~116 cards (10%)
- Gold: ~174 cards (15%)
- Silver: ~232 cards (20%)
- Bronze: ~232 cards (20%)
- Common: ~349 cards (30%)