# Rarity Assignment Bug Fix Summary ## Problems Identified ### 1. Excessive Hall of Fame Cards (23% instead of ~5%) **Root Cause**: Pitcher OPS-against thresholds were too lenient for 2025 season data - **Starters**: 34.7% qualified for HOF (threshold: ≤ 0.4) - **Relievers**: 17.4% qualified for HOF (threshold: ≤ 0.325) - **Example**: Joe Boyle (OPS: 0.365) was HOF → should be Gold ### 2. Players Without Ratings Stuck as Hall of Fame **Root Cause**: Multi-part issue 1. New players created with placeholder `rarity_id: 99` 2. Some players had batting/pitching cards but no ratings in database 3. Inner join in `post_player_updates()` excluded players without ratings 4. They kept HOF (99) instead of being downgraded **Database Status**: - **Production**: ✅ All 429 batters and 532 pitchers have ratings (no issue) - **Dev**: ⚠️ 126 of 583 batters missing ratings (21.6%) --- ## Solutions Implemented ### Fix 1: Season-Aware Rarity Thresholds **New File**: `rarity_thresholds.py` - Dataclass-based threshold configuration - Separate thresholds for 2024 vs 2025 seasons - Type-safe rarity assignment methods **2025 Pitcher Thresholds** (based on percentile analysis): | Rarity | Starters (was → now) | Relievers (was → now) | |--------|---------------------|----------------------| | HOF | ≤ 0.400 → **0.300** | ≤ 0.325 → **0.270** | | Diamond| ≤ 0.475 → **0.354** | ≤ 0.400 → **0.319** | | Gold | ≤ 0.530 → **0.384** | ≤ 0.475 → **0.370** | | Silver | ≤ 0.600 → **0.441** | ≤ 0.550 → **0.436** | | Bronze | ≤ 0.675 → **0.487** | ≤ 0.625 → **0.503** | **Batter Thresholds**: Unchanged (need more data to analyze) --- ### Fix 2: Handle Missing Ratings **Modified Files**: `batters/creation.py`, `pitchers/creation.py` #### Changes in `post_player_updates()`: **Before** (Inner Join - lost players without ratings): ```python total_ratings = pd.merge( await pd_battingcards_df(cardset['id']), await pd_battingcardratings_df(cardset['id']), on='battingcard_id' # Inner join ) ``` **After** (LEFT Join - keeps all players): ```python batting_cards = await pd_battingcards_df(cardset['id']) batting_ratings = await pd_battingcardratings_df(cardset['id'], season) total_ratings = pd.merge( batting_cards, batting_ratings, on='battingcard_id', how='left' # Keep all batting cards ) # Assign default rarity (Common/5) for players without ratings if 'new_rarity_id' not in total_ratings.columns: total_ratings['new_rarity_id'] = 5 elif total_ratings['new_rarity_id'].isna().any(): total_ratings['new_rarity_id'] = total_ratings['new_rarity_id'].fillna(5) ``` #### Added Season Parameter: - `pd_battingcardratings_df(cardset_id, season)` - now calculates rarity using season-aware thresholds - `pd_pitchingcardratings_df(cardset_id, season, pitching_cards)` - calculates rarity with starter/reliever logic - `post_player_updates(..., season)` - passes season to threshold functions --- ## Test Results ### ✅ Threshold Tests ``` Joe Boyle (OPS: 0.365): 2024 Thresholds: HOF (99) 2025 Thresholds: Gold (2) ✓ ``` ### ✅ Missing Ratings Tests ``` OLD: 5 players → 1 after merge (lost 4) NEW: 5 players → 5 after merge (lost 0) - Aaron Judge: HOF (99) ✓ - Others: Common (5) ✓ ``` ### ✅ NaN Handling Tests ``` Case 1: Column with NaN → fillna(5) ✓ Case 2: Column missing → create with 5 ✓ Case 3: No NaN → unchanged ✓ Case 4: get_player_updates → no ValueError ✓ ``` --- ## Files Modified 1. **rarity_thresholds.py** (NEW) - Dataclass definitions for thresholds - Season-aware getter functions - 2024 and 2025 threshold constants 2. **batters/creation.py** - Import `get_batter_thresholds` - Update `pd_battingcardratings_df()` to accept season parameter - Update `post_player_updates()` to use LEFT JOIN and handle NaN - Update `run_batters()` to pass season parameter 3. **pitchers/creation.py** - Import `get_pitcher_thresholds` - Update `pd_pitchingcardratings_df()` to accept season and calculate rarity - Update `post_player_updates()` to use LEFT JOIN and handle NaN - Update `run_pitchers()` to pass season parameter --- ## Migration Notes ### For Next Card Creation Run: - ✅ Will automatically use 2025 thresholds - ✅ Will handle players without ratings correctly - ✅ No manual intervention needed ### For Dev Database (Optional): Since dev has 126 players without ratings stuck as HOF, you can either: **Option 1: Re-run card creation** (recommended) - Will properly create ratings for the 126 players - Will assign correct rarities **Option 2: Quick manual fix** ```sql UPDATE players SET rarity_id = 5 WHERE cost = 99999 AND cardset_id = 24; ``` ### For Historical Cardsets (2024 and earlier): - ✅ Will use original 2024 thresholds - ✅ Backward compatible - ✅ No changes to existing cards --- ## Impact Summary | Metric | Before | After | |--------|--------|-------| | HOF Cards | 267 (23%) | ~58 (5%) expected | | Joe Boyle | HOF (99) | Gold (2) | | Players w/o ratings | HOF (99) | Common (5) | | Threshold accuracy | Static | Season-aware | **Estimated Rarity Distribution After Fix**: - Hall of Fame: ~58 cards (5%) - Diamond: ~116 cards (10%) - Gold: ~174 cards (15%) - Silver: ~232 cards (20%) - Bronze: ~232 cards (20%) - Common: ~349 cards (30%)