181 lines
5.3 KiB
Markdown
181 lines
5.3 KiB
Markdown
# Rarity Assignment Bug Fix Summary
|
|
|
|
## Problems Identified
|
|
|
|
### 1. Excessive Hall of Fame Cards (23% instead of ~5%)
|
|
**Root Cause**: Pitcher OPS-against thresholds were too lenient for 2025 season data
|
|
- **Starters**: 34.7% qualified for HOF (threshold: ≤ 0.4)
|
|
- **Relievers**: 17.4% qualified for HOF (threshold: ≤ 0.325)
|
|
- **Example**: Joe Boyle (OPS: 0.365) was HOF → should be Gold
|
|
|
|
### 2. Players Without Ratings Stuck as Hall of Fame
|
|
**Root Cause**: Multi-part issue
|
|
1. New players created with placeholder `rarity_id: 99`
|
|
2. Some players had batting/pitching cards but no ratings in database
|
|
3. Inner join in `post_player_updates()` excluded players without ratings
|
|
4. They kept HOF (99) instead of being downgraded
|
|
|
|
**Database Status**:
|
|
- **Production**: ✅ All 429 batters and 532 pitchers have ratings (no issue)
|
|
- **Dev**: ⚠️ 126 of 583 batters missing ratings (21.6%)
|
|
|
|
---
|
|
|
|
## Solutions Implemented
|
|
|
|
### Fix 1: Season-Aware Rarity Thresholds
|
|
|
|
**New File**: `rarity_thresholds.py`
|
|
- Dataclass-based threshold configuration
|
|
- Separate thresholds for 2024 vs 2025 seasons
|
|
- Type-safe rarity assignment methods
|
|
|
|
**2025 Pitcher Thresholds** (based on percentile analysis):
|
|
|
|
| Rarity | Starters (was → now) | Relievers (was → now) |
|
|
|--------|---------------------|----------------------|
|
|
| HOF | ≤ 0.400 → **0.300** | ≤ 0.325 → **0.270** |
|
|
| Diamond| ≤ 0.475 → **0.354** | ≤ 0.400 → **0.319** |
|
|
| Gold | ≤ 0.530 → **0.384** | ≤ 0.475 → **0.370** |
|
|
| Silver | ≤ 0.600 → **0.441** | ≤ 0.550 → **0.436** |
|
|
| Bronze | ≤ 0.675 → **0.487** | ≤ 0.625 → **0.503** |
|
|
|
|
**Batter Thresholds**: Unchanged (need more data to analyze)
|
|
|
|
---
|
|
|
|
### Fix 2: Handle Missing Ratings
|
|
|
|
**Modified Files**: `batters/creation.py`, `pitchers/creation.py`
|
|
|
|
#### Changes in `post_player_updates()`:
|
|
|
|
**Before** (Inner Join - lost players without ratings):
|
|
```python
|
|
total_ratings = pd.merge(
|
|
await pd_battingcards_df(cardset['id']),
|
|
await pd_battingcardratings_df(cardset['id']),
|
|
on='battingcard_id' # Inner join
|
|
)
|
|
```
|
|
|
|
**After** (LEFT Join - keeps all players):
|
|
```python
|
|
batting_cards = await pd_battingcards_df(cardset['id'])
|
|
batting_ratings = await pd_battingcardratings_df(cardset['id'], season)
|
|
|
|
total_ratings = pd.merge(
|
|
batting_cards,
|
|
batting_ratings,
|
|
on='battingcard_id',
|
|
how='left' # Keep all batting cards
|
|
)
|
|
|
|
# Assign default rarity (Common/5) for players without ratings
|
|
if 'new_rarity_id' not in total_ratings.columns:
|
|
total_ratings['new_rarity_id'] = 5
|
|
elif total_ratings['new_rarity_id'].isna().any():
|
|
total_ratings['new_rarity_id'] = total_ratings['new_rarity_id'].fillna(5)
|
|
```
|
|
|
|
#### Added Season Parameter:
|
|
- `pd_battingcardratings_df(cardset_id, season)` - now calculates rarity using season-aware thresholds
|
|
- `pd_pitchingcardratings_df(cardset_id, season, pitching_cards)` - calculates rarity with starter/reliever logic
|
|
- `post_player_updates(..., season)` - passes season to threshold functions
|
|
|
|
---
|
|
|
|
## Test Results
|
|
|
|
### ✅ Threshold Tests
|
|
```
|
|
Joe Boyle (OPS: 0.365):
|
|
2024 Thresholds: HOF (99)
|
|
2025 Thresholds: Gold (2) ✓
|
|
```
|
|
|
|
### ✅ Missing Ratings Tests
|
|
```
|
|
OLD: 5 players → 1 after merge (lost 4)
|
|
NEW: 5 players → 5 after merge (lost 0)
|
|
- Aaron Judge: HOF (99) ✓
|
|
- Others: Common (5) ✓
|
|
```
|
|
|
|
### ✅ NaN Handling Tests
|
|
```
|
|
Case 1: Column with NaN → fillna(5) ✓
|
|
Case 2: Column missing → create with 5 ✓
|
|
Case 3: No NaN → unchanged ✓
|
|
Case 4: get_player_updates → no ValueError ✓
|
|
```
|
|
|
|
---
|
|
|
|
## Files Modified
|
|
|
|
1. **rarity_thresholds.py** (NEW)
|
|
- Dataclass definitions for thresholds
|
|
- Season-aware getter functions
|
|
- 2024 and 2025 threshold constants
|
|
|
|
2. **batters/creation.py**
|
|
- Import `get_batter_thresholds`
|
|
- Update `pd_battingcardratings_df()` to accept season parameter
|
|
- Update `post_player_updates()` to use LEFT JOIN and handle NaN
|
|
- Update `run_batters()` to pass season parameter
|
|
|
|
3. **pitchers/creation.py**
|
|
- Import `get_pitcher_thresholds`
|
|
- Update `pd_pitchingcardratings_df()` to accept season and calculate rarity
|
|
- Update `post_player_updates()` to use LEFT JOIN and handle NaN
|
|
- Update `run_pitchers()` to pass season parameter
|
|
|
|
---
|
|
|
|
## Migration Notes
|
|
|
|
### For Next Card Creation Run:
|
|
- ✅ Will automatically use 2025 thresholds
|
|
- ✅ Will handle players without ratings correctly
|
|
- ✅ No manual intervention needed
|
|
|
|
### For Dev Database (Optional):
|
|
Since dev has 126 players without ratings stuck as HOF, you can either:
|
|
|
|
**Option 1: Re-run card creation** (recommended)
|
|
- Will properly create ratings for the 126 players
|
|
- Will assign correct rarities
|
|
|
|
**Option 2: Quick manual fix**
|
|
```sql
|
|
UPDATE players
|
|
SET rarity_id = 5
|
|
WHERE cost = 99999
|
|
AND cardset_id = 24;
|
|
```
|
|
|
|
### For Historical Cardsets (2024 and earlier):
|
|
- ✅ Will use original 2024 thresholds
|
|
- ✅ Backward compatible
|
|
- ✅ No changes to existing cards
|
|
|
|
---
|
|
|
|
## Impact Summary
|
|
|
|
| Metric | Before | After |
|
|
|--------|--------|-------|
|
|
| HOF Cards | 267 (23%) | ~58 (5%) expected |
|
|
| Joe Boyle | HOF (99) | Gold (2) |
|
|
| Players w/o ratings | HOF (99) | Common (5) |
|
|
| Threshold accuracy | Static | Season-aware |
|
|
|
|
**Estimated Rarity Distribution After Fix**:
|
|
- Hall of Fame: ~58 cards (5%)
|
|
- Diamond: ~116 cards (10%)
|
|
- Gold: ~174 cards (15%)
|
|
- Silver: ~232 cards (20%)
|
|
- Bronze: ~232 cards (20%)
|
|
- Common: ~349 cards (30%)
|