paper-dynasty-card-creation/docs/RETROSHEET_ARM_SUMMARY.md

# Retrosheet Outfield Arm Rating - Implementation Summary

## What I've Created

I've analyzed your Retrosheet play-by-play data and created a comprehensive system to calculate outfield arm ratings from historical game events. This gives you an alternative to Baseball Reference's `bis_runs_outfield` statistic that works for all historical seasons.

## Files Created

### 1. Proposal Document
**Location:** `docs/of_arm_rating_improvement_proposal.md`

Comprehensive proposal explaining:
- Current limitations of the `bis_runs_outfield` approach
- Available data in Retrosheet events
- Statistical analysis of 2005 season
- Proposed multi-metric composite formula
- Comparison of advantages/disadvantages
- Integration recommendations

### 2. Implementation Module
**Location:** `defenders/retrosheet_arm_calculator.py`

Production-ready Python module with:
- `calculate_of_arms_from_retrosheet()` - Main entry point for batch calculation
- `calculate_player_arm_rating()` - Calculate rating for individual player
- `calculate_position_baselines()` - Position-adjusted normalization
- Detailed logging and documentation

### 3. Validation Script
**Location:** `test_retrosheet_arms.py`

Testing/validation tool that:
- Calculates ratings for all 2005 outfielders
- Shows distribution of ratings
- Identifies top 20 and bottom 10 arms
- Validates against known strong arms (Ichiro, Guerrero, etc.)
- Generates detailed statistical reports

## Key Findings from 2005 Analysis

### Available Metrics from Retrosheet

From the play-by-play data, we can extract:

1. **Total Assists** - OF threw out a runner
2. **Home Throws** - Threw out runner at home (strongest arm indicator)
3. **Batter Extra-Base Outs** - Threw out batter trying to stretch (prevents doubles)
4. **Assist Rate** - Assists per balls fielded (opportunity-adjusted)
5. **Throwout Rate** - Success when attempting throw

### 2005 League Statistics

| Position | Avg Assist Rate | Avg Throwout Rate | Total Assists |
|----------|----------------|-------------------|---------------|
| LF       | 3.01%          | 86.71%            | 294           |
| CF       | 2.04%          | 81.23%            | 247           |
| RF       | 2.77%          | 79.52%            | 288           |

**Key Insight:** Assist rates and success rates vary by position, so we use position-adjusted z-scores.

## How the Formula Works

### Composite Score (Simplified Rate-Dominant Formula)
```
raw_score = (
    (assist_rate * 300) +        # PRIMARY: Assist rate (dominant factor)
    (home_throws * 1.0) +        # Quality: home plate throws
    (batter_extra_outs * 1.0) +  # Quality: preventing extra bases
    (total_assists * 0.1)        # Minimal volume bonus
)
```

**Philosophy:** Assist rate is the dominant driver. Assists are already outs by definition,
so no separate "throwout rate" is needed. Quality indicators (home throws, batter extra outs)
provide minimal context about the types of plays made.

Elite assist rates (8%+) contribute 24+ points vs average rates (3%) contribute ~9 points.

### Position-Adjusted Rating
- Calculate league average and standard deviation for LF/CF/RF
- Convert player's raw score to z-score: `(score - avg) / stddev`
- Map z-score to -6 to +5 rating scale (normal distribution)

### Rating Scale (Calibrated Distribution)
| Z-Score | Rating | Description | Approx % |
|---------|--------|-------------|----------|
| > 2.5   | -6     | Elite cannon | ~1% |
| 2.0-2.5 | -5     | Outstanding | ~2% |
| 1.5-2.0 | -4     | Excellent | ~3% |
| 1.0-1.5 | -3     | Very Good | ~5% |
| 0.5-1.0 | -2     | Above Average | ~15% |
| 0.0-0.5 | -1     | Slightly Above | ~30% |
| -0.5-0.0| 0      | Average | ~40% |
| -0.8--0.5| 1     | Slightly Below | ~20% |
| -1.2--0.8| 2     | Below Average | ~10% |
| -1.5--1.2| 3     | Poor | ~5% |
| -1.8--1.5| 4     | Very Poor | ~2% |
| < -1.8  | 5      | Very Weak | ~1% |

**Note:** Thresholds adjusted after 300x assist_rate weight compressed z-score spread.

## Testing the Implementation

### Step 1: Run the validation script
```bash
python test_retrosheet_arms.py
```

This will:
- Calculate ratings for all 2005 outfielders
- Show distribution (how many players at each rating)
- Identify elite vs weak arms
- Validate against known strong arms

### Step 2: Review the output
Look for:
- **Elite arms** (rating ≤ -3): Should be players known for strong arms
- **Distribution**: Should be bell curve centered around 0
- **Position differences**: CF may have more volume but RF/LF may have stronger arms

### Step 3: Compare to current method
For players with both `bis_runs_outfield` (from Baseball Reference) and Retrosheet data:
- Do the ratings correlate?
- Where do they differ and why?
- Which seems more accurate to your domain knowledge?

## Integration Options

### Option 1: Hybrid (Recommended for Development)
Use Baseball Reference when available, Retrosheet as fallback:

```python
# In defenders/calcs_defense.py, around line 71-84
if 'bis_runs_outfield' in pos_df.columns:
    # Current method - use BIS runs
    of_arms.append(int(pos_data[0].at[df_data["key_bbref"], 'bis_runs_outfield']))
else:
    # Fallback - use Retrosheet calculation
    if not hasattr(self, 'retrosheet_arms'):
        from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
        self.retrosheet_arms = calculate_of_arms_from_retrosheet(df_events, season_pct)

    # Get arm rating from Retrosheet
    from defenders.retrosheet_arm_calculator import get_arm_for_player
    arm_rating = get_arm_for_player(self.retrosheet_arms, df_data['key_bbref'])
    return arm_rating  # Skip the arm_outfield() call
```

### Option 2: Full Replacement
Always use Retrosheet for consistency:

```python
# In retrosheet_data.py, after loading events
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet

df_events = pd.read_csv(EVENTS_FILENAME)
retrosheet_arm_ratings = calculate_of_arms_from_retrosheet(df_events, SEASON_PCT)

# Then in create_positions() call:
from defenders.retrosheet_arm_calculator import get_arm_for_player
arm_rating = get_arm_for_player(retrosheet_arm_ratings, df_data['key_bbref'])
```

## Sample Size Requirements

- **Minimum:** 50 balls fielded (putouts + assists) per position
- **Full season:** Most regulars will qualify (200+ balls)
- **Partial season:** Adjust with `season_pct` parameter
- **Platoon players:** May not qualify; get default rating of 0 (average)

## Why This Approach is Better

### Advantages
1. **Historical Coverage** - Works for any season with Retrosheet data (1921+)
2. **Multi-Dimensional** - Considers quality and quantity of throws
3. **Position-Adjusted** - Accounts for different expectations by position
4. **Transparent** - Formula is clear and can be tuned
5. **Context-Aware** - Weights high-value plays (home throws) more heavily

### Disadvantages
1. **Processing Overhead** - Must parse large play-by-play files
2. **Sample Size** - Platoon players may not qualify
3. **Indirect** - Measures outcomes, not raw arm strength
4. **One-Time Work** - Need to calculate baselines for each season

## Next Steps

1. **Run validation script** to see 2005 results
2. **Review elite arms** - Do they match your expectations?
3. **Choose integration approach** (hybrid vs full replacement)
4. **Test on a small cardset** before full deployment
5. **Tune weights if needed** based on validation results

## Questions to Consider

1. Do the elite arms (rating -4 to -6) match players you know had strong arms?
2. Are there players with unexpectedly high/low ratings? Why?
3. How does this compare to the `bis_runs_outfield` method for 2005?
4. Should home throws be weighted even more heavily?
5. Should we adjust thresholds to get more granular ratings?

## Support

The implementation includes extensive logging. Set logging level to DEBUG to see:
- Individual player calculations
- Raw scores and z-scores
- Position baselines
- Sample size warnings

```python
import logging
logging.getLogger('exceptions').setLevel(logging.DEBUG)
```

---

**Created:** 2025-11-15
**Status:** Ready for Testing
**Recommendation:** Run `test_retrosheet_arms.py` to validate before integration