paper-dynasty-card-creation/RARITY_BUG_FIX_SUMMARY.md
2025-11-08 16:57:35 -06:00

181 lines
5.3 KiB
Markdown

# Rarity Assignment Bug Fix Summary
## Problems Identified
### 1. Excessive Hall of Fame Cards (23% instead of ~5%)
**Root Cause**: Pitcher OPS-against thresholds were too lenient for 2025 season data
- **Starters**: 34.7% qualified for HOF (threshold: ≤ 0.4)
- **Relievers**: 17.4% qualified for HOF (threshold: ≤ 0.325)
- **Example**: Joe Boyle (OPS: 0.365) was HOF → should be Gold
### 2. Players Without Ratings Stuck as Hall of Fame
**Root Cause**: Multi-part issue
1. New players created with placeholder `rarity_id: 99`
2. Some players had batting/pitching cards but no ratings in database
3. Inner join in `post_player_updates()` excluded players without ratings
4. They kept HOF (99) instead of being downgraded
**Database Status**:
- **Production**: ✅ All 429 batters and 532 pitchers have ratings (no issue)
- **Dev**: ⚠️ 126 of 583 batters missing ratings (21.6%)
---
## Solutions Implemented
### Fix 1: Season-Aware Rarity Thresholds
**New File**: `rarity_thresholds.py`
- Dataclass-based threshold configuration
- Separate thresholds for 2024 vs 2025 seasons
- Type-safe rarity assignment methods
**2025 Pitcher Thresholds** (based on percentile analysis):
| Rarity | Starters (was → now) | Relievers (was → now) |
|--------|---------------------|----------------------|
| HOF | ≤ 0.400 → **0.300** | ≤ 0.325 → **0.270** |
| Diamond| ≤ 0.475 → **0.354** | ≤ 0.400 → **0.319** |
| Gold | ≤ 0.530 → **0.384** | ≤ 0.475 → **0.370** |
| Silver | ≤ 0.600 → **0.441** | ≤ 0.550 → **0.436** |
| Bronze | ≤ 0.675 → **0.487** | ≤ 0.625 → **0.503** |
**Batter Thresholds**: Unchanged (need more data to analyze)
---
### Fix 2: Handle Missing Ratings
**Modified Files**: `batters/creation.py`, `pitchers/creation.py`
#### Changes in `post_player_updates()`:
**Before** (Inner Join - lost players without ratings):
```python
total_ratings = pd.merge(
await pd_battingcards_df(cardset['id']),
await pd_battingcardratings_df(cardset['id']),
on='battingcard_id' # Inner join
)
```
**After** (LEFT Join - keeps all players):
```python
batting_cards = await pd_battingcards_df(cardset['id'])
batting_ratings = await pd_battingcardratings_df(cardset['id'], season)
total_ratings = pd.merge(
batting_cards,
batting_ratings,
on='battingcard_id',
how='left' # Keep all batting cards
)
# Assign default rarity (Common/5) for players without ratings
if 'new_rarity_id' not in total_ratings.columns:
total_ratings['new_rarity_id'] = 5
elif total_ratings['new_rarity_id'].isna().any():
total_ratings['new_rarity_id'] = total_ratings['new_rarity_id'].fillna(5)
```
#### Added Season Parameter:
- `pd_battingcardratings_df(cardset_id, season)` - now calculates rarity using season-aware thresholds
- `pd_pitchingcardratings_df(cardset_id, season, pitching_cards)` - calculates rarity with starter/reliever logic
- `post_player_updates(..., season)` - passes season to threshold functions
---
## Test Results
### ✅ Threshold Tests
```
Joe Boyle (OPS: 0.365):
2024 Thresholds: HOF (99)
2025 Thresholds: Gold (2) ✓
```
### ✅ Missing Ratings Tests
```
OLD: 5 players → 1 after merge (lost 4)
NEW: 5 players → 5 after merge (lost 0)
- Aaron Judge: HOF (99) ✓
- Others: Common (5) ✓
```
### ✅ NaN Handling Tests
```
Case 1: Column with NaN → fillna(5) ✓
Case 2: Column missing → create with 5 ✓
Case 3: No NaN → unchanged ✓
Case 4: get_player_updates → no ValueError ✓
```
---
## Files Modified
1. **rarity_thresholds.py** (NEW)
- Dataclass definitions for thresholds
- Season-aware getter functions
- 2024 and 2025 threshold constants
2. **batters/creation.py**
- Import `get_batter_thresholds`
- Update `pd_battingcardratings_df()` to accept season parameter
- Update `post_player_updates()` to use LEFT JOIN and handle NaN
- Update `run_batters()` to pass season parameter
3. **pitchers/creation.py**
- Import `get_pitcher_thresholds`
- Update `pd_pitchingcardratings_df()` to accept season and calculate rarity
- Update `post_player_updates()` to use LEFT JOIN and handle NaN
- Update `run_pitchers()` to pass season parameter
---
## Migration Notes
### For Next Card Creation Run:
- ✅ Will automatically use 2025 thresholds
- ✅ Will handle players without ratings correctly
- ✅ No manual intervention needed
### For Dev Database (Optional):
Since dev has 126 players without ratings stuck as HOF, you can either:
**Option 1: Re-run card creation** (recommended)
- Will properly create ratings for the 126 players
- Will assign correct rarities
**Option 2: Quick manual fix**
```sql
UPDATE players
SET rarity_id = 5
WHERE cost = 99999
AND cardset_id = 24;
```
### For Historical Cardsets (2024 and earlier):
- ✅ Will use original 2024 thresholds
- ✅ Backward compatible
- ✅ No changes to existing cards
---
## Impact Summary
| Metric | Before | After |
|--------|--------|-------|
| HOF Cards | 267 (23%) | ~58 (5%) expected |
| Joe Boyle | HOF (99) | Gold (2) |
| Players w/o ratings | HOF (99) | Common (5) |
| Threshold accuracy | Static | Season-aware |
**Estimated Rarity Distribution After Fix**:
- Hall of Fame: ~58 cards (5%)
- Diamond: ~116 cards (10%)
- Gold: ~174 cards (15%)
- Silver: ~232 cards (20%)
- Bronze: ~232 cards (20%)
- Common: ~349 cards (30%)