220 lines
8.0 KiB
Markdown
220 lines
8.0 KiB
Markdown
# Retrosheet Outfield Arm Rating - Implementation Summary
|
|
|
|
## What I've Created
|
|
|
|
I've analyzed your Retrosheet play-by-play data and created a comprehensive system to calculate outfield arm ratings from historical game events. This gives you an alternative to Baseball Reference's `bis_runs_outfield` statistic that works for all historical seasons.
|
|
|
|
## Files Created
|
|
|
|
### 1. Proposal Document
|
|
**Location:** `docs/of_arm_rating_improvement_proposal.md`
|
|
|
|
Comprehensive proposal explaining:
|
|
- Current limitations of the `bis_runs_outfield` approach
|
|
- Available data in Retrosheet events
|
|
- Statistical analysis of 2005 season
|
|
- Proposed multi-metric composite formula
|
|
- Comparison of advantages/disadvantages
|
|
- Integration recommendations
|
|
|
|
### 2. Implementation Module
|
|
**Location:** `defenders/retrosheet_arm_calculator.py`
|
|
|
|
Production-ready Python module with:
|
|
- `calculate_of_arms_from_retrosheet()` - Main entry point for batch calculation
|
|
- `calculate_player_arm_rating()` - Calculate rating for individual player
|
|
- `calculate_position_baselines()` - Position-adjusted normalization
|
|
- Detailed logging and documentation
|
|
|
|
### 3. Validation Script
|
|
**Location:** `test_retrosheet_arms.py`
|
|
|
|
Testing/validation tool that:
|
|
- Calculates ratings for all 2005 outfielders
|
|
- Shows distribution of ratings
|
|
- Identifies top 20 and bottom 10 arms
|
|
- Validates against known strong arms (Ichiro, Guerrero, etc.)
|
|
- Generates detailed statistical reports
|
|
|
|
## Key Findings from 2005 Analysis
|
|
|
|
### Available Metrics from Retrosheet
|
|
|
|
From the play-by-play data, we can extract:
|
|
|
|
1. **Total Assists** - OF threw out a runner
|
|
2. **Home Throws** - Threw out runner at home (strongest arm indicator)
|
|
3. **Batter Extra-Base Outs** - Threw out batter trying to stretch (prevents doubles)
|
|
4. **Assist Rate** - Assists per balls fielded (opportunity-adjusted)
|
|
5. **Throwout Rate** - Success when attempting throw
|
|
|
|
### 2005 League Statistics
|
|
|
|
| Position | Avg Assist Rate | Avg Throwout Rate | Total Assists |
|
|
|----------|----------------|-------------------|---------------|
|
|
| LF | 3.01% | 86.71% | 294 |
|
|
| CF | 2.04% | 81.23% | 247 |
|
|
| RF | 2.77% | 79.52% | 288 |
|
|
|
|
**Key Insight:** Assist rates and success rates vary by position, so we use position-adjusted z-scores.
|
|
|
|
## How the Formula Works
|
|
|
|
### Composite Score (Simplified Rate-Dominant Formula)
|
|
```
|
|
raw_score = (
|
|
(assist_rate * 300) + # PRIMARY: Assist rate (dominant factor)
|
|
(home_throws * 1.0) + # Quality: home plate throws
|
|
(batter_extra_outs * 1.0) + # Quality: preventing extra bases
|
|
(total_assists * 0.1) # Minimal volume bonus
|
|
)
|
|
```
|
|
|
|
**Philosophy:** Assist rate is the dominant driver. Assists are already outs by definition,
|
|
so no separate "throwout rate" is needed. Quality indicators (home throws, batter extra outs)
|
|
provide minimal context about the types of plays made.
|
|
|
|
Elite assist rates (8%+) contribute 24+ points vs average rates (3%) contribute ~9 points.
|
|
|
|
### Position-Adjusted Rating
|
|
- Calculate league average and standard deviation for LF/CF/RF
|
|
- Convert player's raw score to z-score: `(score - avg) / stddev`
|
|
- Map z-score to -6 to +5 rating scale (normal distribution)
|
|
|
|
### Rating Scale (Calibrated Distribution)
|
|
| Z-Score | Rating | Description | Approx % |
|
|
|---------|--------|-------------|----------|
|
|
| > 2.5 | -6 | Elite cannon | ~1% |
|
|
| 2.0-2.5 | -5 | Outstanding | ~2% |
|
|
| 1.5-2.0 | -4 | Excellent | ~3% |
|
|
| 1.0-1.5 | -3 | Very Good | ~5% |
|
|
| 0.5-1.0 | -2 | Above Average | ~15% |
|
|
| 0.0-0.5 | -1 | Slightly Above | ~30% |
|
|
| -0.5-0.0| 0 | Average | ~40% |
|
|
| -0.8--0.5| 1 | Slightly Below | ~20% |
|
|
| -1.2--0.8| 2 | Below Average | ~10% |
|
|
| -1.5--1.2| 3 | Poor | ~5% |
|
|
| -1.8--1.5| 4 | Very Poor | ~2% |
|
|
| < -1.8 | 5 | Very Weak | ~1% |
|
|
|
|
**Note:** Thresholds adjusted after 300x assist_rate weight compressed z-score spread.
|
|
|
|
## Testing the Implementation
|
|
|
|
### Step 1: Run the validation script
|
|
```bash
|
|
python test_retrosheet_arms.py
|
|
```
|
|
|
|
This will:
|
|
- Calculate ratings for all 2005 outfielders
|
|
- Show distribution (how many players at each rating)
|
|
- Identify elite vs weak arms
|
|
- Validate against known strong arms
|
|
|
|
### Step 2: Review the output
|
|
Look for:
|
|
- **Elite arms** (rating ≤ -3): Should be players known for strong arms
|
|
- **Distribution**: Should be bell curve centered around 0
|
|
- **Position differences**: CF may have more volume but RF/LF may have stronger arms
|
|
|
|
### Step 3: Compare to current method
|
|
For players with both `bis_runs_outfield` (from Baseball Reference) and Retrosheet data:
|
|
- Do the ratings correlate?
|
|
- Where do they differ and why?
|
|
- Which seems more accurate to your domain knowledge?
|
|
|
|
## Integration Options
|
|
|
|
### Option 1: Hybrid (Recommended for Development)
|
|
Use Baseball Reference when available, Retrosheet as fallback:
|
|
|
|
```python
|
|
# In defenders/calcs_defense.py, around line 71-84
|
|
if 'bis_runs_outfield' in pos_df.columns:
|
|
# Current method - use BIS runs
|
|
of_arms.append(int(pos_data[0].at[df_data["key_bbref"], 'bis_runs_outfield']))
|
|
else:
|
|
# Fallback - use Retrosheet calculation
|
|
if not hasattr(self, 'retrosheet_arms'):
|
|
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
|
|
self.retrosheet_arms = calculate_of_arms_from_retrosheet(df_events, season_pct)
|
|
|
|
# Get arm rating from Retrosheet
|
|
from defenders.retrosheet_arm_calculator import get_arm_for_player
|
|
arm_rating = get_arm_for_player(self.retrosheet_arms, df_data['key_bbref'])
|
|
return arm_rating # Skip the arm_outfield() call
|
|
```
|
|
|
|
### Option 2: Full Replacement
|
|
Always use Retrosheet for consistency:
|
|
|
|
```python
|
|
# In retrosheet_data.py, after loading events
|
|
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
|
|
|
|
df_events = pd.read_csv(EVENTS_FILENAME)
|
|
retrosheet_arm_ratings = calculate_of_arms_from_retrosheet(df_events, SEASON_PCT)
|
|
|
|
# Then in create_positions() call:
|
|
from defenders.retrosheet_arm_calculator import get_arm_for_player
|
|
arm_rating = get_arm_for_player(retrosheet_arm_ratings, df_data['key_bbref'])
|
|
```
|
|
|
|
## Sample Size Requirements
|
|
|
|
- **Minimum:** 50 balls fielded (putouts + assists) per position
|
|
- **Full season:** Most regulars will qualify (200+ balls)
|
|
- **Partial season:** Adjust with `season_pct` parameter
|
|
- **Platoon players:** May not qualify; get default rating of 0 (average)
|
|
|
|
## Why This Approach is Better
|
|
|
|
### Advantages
|
|
1. **Historical Coverage** - Works for any season with Retrosheet data (1921+)
|
|
2. **Multi-Dimensional** - Considers quality and quantity of throws
|
|
3. **Position-Adjusted** - Accounts for different expectations by position
|
|
4. **Transparent** - Formula is clear and can be tuned
|
|
5. **Context-Aware** - Weights high-value plays (home throws) more heavily
|
|
|
|
### Disadvantages
|
|
1. **Processing Overhead** - Must parse large play-by-play files
|
|
2. **Sample Size** - Platoon players may not qualify
|
|
3. **Indirect** - Measures outcomes, not raw arm strength
|
|
4. **One-Time Work** - Need to calculate baselines for each season
|
|
|
|
## Next Steps
|
|
|
|
1. **Run validation script** to see 2005 results
|
|
2. **Review elite arms** - Do they match your expectations?
|
|
3. **Choose integration approach** (hybrid vs full replacement)
|
|
4. **Test on a small cardset** before full deployment
|
|
5. **Tune weights if needed** based on validation results
|
|
|
|
## Questions to Consider
|
|
|
|
1. Do the elite arms (rating -4 to -6) match players you know had strong arms?
|
|
2. Are there players with unexpectedly high/low ratings? Why?
|
|
3. How does this compare to the `bis_runs_outfield` method for 2005?
|
|
4. Should home throws be weighted even more heavily?
|
|
5. Should we adjust thresholds to get more granular ratings?
|
|
|
|
## Support
|
|
|
|
The implementation includes extensive logging. Set logging level to DEBUG to see:
|
|
- Individual player calculations
|
|
- Raw scores and z-scores
|
|
- Position baselines
|
|
- Sample size warnings
|
|
|
|
```python
|
|
import logging
|
|
logging.getLogger('exceptions').setLevel(logging.DEBUG)
|
|
```
|
|
|
|
---
|
|
|
|
**Created:** 2025-11-15
|
|
**Status:** Ready for Testing
|
|
**Recommendation:** Run `test_retrosheet_arms.py` to validate before integration
|