paper-dynasty-card-creation/docs/RETROSHEET_ARM_SUMMARY.md
2025-11-23 01:28:33 -06:00

8.0 KiB

Retrosheet Outfield Arm Rating - Implementation Summary

What I've Created

I've analyzed your Retrosheet play-by-play data and created a comprehensive system to calculate outfield arm ratings from historical game events. This gives you an alternative to Baseball Reference's bis_runs_outfield statistic that works for all historical seasons.

Files Created

1. Proposal Document

Location: docs/of_arm_rating_improvement_proposal.md

Comprehensive proposal explaining:

  • Current limitations of the bis_runs_outfield approach
  • Available data in Retrosheet events
  • Statistical analysis of 2005 season
  • Proposed multi-metric composite formula
  • Comparison of advantages/disadvantages
  • Integration recommendations

2. Implementation Module

Location: defenders/retrosheet_arm_calculator.py

Production-ready Python module with:

  • calculate_of_arms_from_retrosheet() - Main entry point for batch calculation
  • calculate_player_arm_rating() - Calculate rating for individual player
  • calculate_position_baselines() - Position-adjusted normalization
  • Detailed logging and documentation

3. Validation Script

Location: test_retrosheet_arms.py

Testing/validation tool that:

  • Calculates ratings for all 2005 outfielders
  • Shows distribution of ratings
  • Identifies top 20 and bottom 10 arms
  • Validates against known strong arms (Ichiro, Guerrero, etc.)
  • Generates detailed statistical reports

Key Findings from 2005 Analysis

Available Metrics from Retrosheet

From the play-by-play data, we can extract:

  1. Total Assists - OF threw out a runner
  2. Home Throws - Threw out runner at home (strongest arm indicator)
  3. Batter Extra-Base Outs - Threw out batter trying to stretch (prevents doubles)
  4. Assist Rate - Assists per balls fielded (opportunity-adjusted)
  5. Throwout Rate - Success when attempting throw

2005 League Statistics

Position Avg Assist Rate Avg Throwout Rate Total Assists
LF 3.01% 86.71% 294
CF 2.04% 81.23% 247
RF 2.77% 79.52% 288

Key Insight: Assist rates and success rates vary by position, so we use position-adjusted z-scores.

How the Formula Works

Composite Score (Simplified Rate-Dominant Formula)

raw_score = (
    (assist_rate * 300) +        # PRIMARY: Assist rate (dominant factor)
    (home_throws * 1.0) +        # Quality: home plate throws
    (batter_extra_outs * 1.0) +  # Quality: preventing extra bases
    (total_assists * 0.1)        # Minimal volume bonus
)

Philosophy: Assist rate is the dominant driver. Assists are already outs by definition, so no separate "throwout rate" is needed. Quality indicators (home throws, batter extra outs) provide minimal context about the types of plays made.

Elite assist rates (8%+) contribute 24+ points vs average rates (3%) contribute ~9 points.

Position-Adjusted Rating

  • Calculate league average and standard deviation for LF/CF/RF
  • Convert player's raw score to z-score: (score - avg) / stddev
  • Map z-score to -6 to +5 rating scale (normal distribution)

Rating Scale (Calibrated Distribution)

Z-Score Rating Description Approx %
> 2.5 -6 Elite cannon ~1%
2.0-2.5 -5 Outstanding ~2%
1.5-2.0 -4 Excellent ~3%
1.0-1.5 -3 Very Good ~5%
0.5-1.0 -2 Above Average ~15%
0.0-0.5 -1 Slightly Above ~30%
-0.5-0.0 0 Average ~40%
-0.8--0.5 1 Slightly Below ~20%
-1.2--0.8 2 Below Average ~10%
-1.5--1.2 3 Poor ~5%
-1.8--1.5 4 Very Poor ~2%
< -1.8 5 Very Weak ~1%

Note: Thresholds adjusted after 300x assist_rate weight compressed z-score spread.

Testing the Implementation

Step 1: Run the validation script

python test_retrosheet_arms.py

This will:

  • Calculate ratings for all 2005 outfielders
  • Show distribution (how many players at each rating)
  • Identify elite vs weak arms
  • Validate against known strong arms

Step 2: Review the output

Look for:

  • Elite arms (rating ≤ -3): Should be players known for strong arms
  • Distribution: Should be bell curve centered around 0
  • Position differences: CF may have more volume but RF/LF may have stronger arms

Step 3: Compare to current method

For players with both bis_runs_outfield (from Baseball Reference) and Retrosheet data:

  • Do the ratings correlate?
  • Where do they differ and why?
  • Which seems more accurate to your domain knowledge?

Integration Options

Use Baseball Reference when available, Retrosheet as fallback:

# In defenders/calcs_defense.py, around line 71-84
if 'bis_runs_outfield' in pos_df.columns:
    # Current method - use BIS runs
    of_arms.append(int(pos_data[0].at[df_data["key_bbref"], 'bis_runs_outfield']))
else:
    # Fallback - use Retrosheet calculation
    if not hasattr(self, 'retrosheet_arms'):
        from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
        self.retrosheet_arms = calculate_of_arms_from_retrosheet(df_events, season_pct)

    # Get arm rating from Retrosheet
    from defenders.retrosheet_arm_calculator import get_arm_for_player
    arm_rating = get_arm_for_player(self.retrosheet_arms, df_data['key_bbref'])
    return arm_rating  # Skip the arm_outfield() call

Option 2: Full Replacement

Always use Retrosheet for consistency:

# In retrosheet_data.py, after loading events
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet

df_events = pd.read_csv(EVENTS_FILENAME)
retrosheet_arm_ratings = calculate_of_arms_from_retrosheet(df_events, SEASON_PCT)

# Then in create_positions() call:
from defenders.retrosheet_arm_calculator import get_arm_for_player
arm_rating = get_arm_for_player(retrosheet_arm_ratings, df_data['key_bbref'])

Sample Size Requirements

  • Minimum: 50 balls fielded (putouts + assists) per position
  • Full season: Most regulars will qualify (200+ balls)
  • Partial season: Adjust with season_pct parameter
  • Platoon players: May not qualify; get default rating of 0 (average)

Why This Approach is Better

Advantages

  1. Historical Coverage - Works for any season with Retrosheet data (1921+)
  2. Multi-Dimensional - Considers quality and quantity of throws
  3. Position-Adjusted - Accounts for different expectations by position
  4. Transparent - Formula is clear and can be tuned
  5. Context-Aware - Weights high-value plays (home throws) more heavily

Disadvantages

  1. Processing Overhead - Must parse large play-by-play files
  2. Sample Size - Platoon players may not qualify
  3. Indirect - Measures outcomes, not raw arm strength
  4. One-Time Work - Need to calculate baselines for each season

Next Steps

  1. Run validation script to see 2005 results
  2. Review elite arms - Do they match your expectations?
  3. Choose integration approach (hybrid vs full replacement)
  4. Test on a small cardset before full deployment
  5. Tune weights if needed based on validation results

Questions to Consider

  1. Do the elite arms (rating -4 to -6) match players you know had strong arms?
  2. Are there players with unexpectedly high/low ratings? Why?
  3. How does this compare to the bis_runs_outfield method for 2005?
  4. Should home throws be weighted even more heavily?
  5. Should we adjust thresholds to get more granular ratings?

Support

The implementation includes extensive logging. Set logging level to DEBUG to see:

  • Individual player calculations
  • Raw scores and z-scores
  • Position baselines
  • Sample size warnings
import logging
logging.getLogger('exceptions').setLevel(logging.DEBUG)

Created: 2025-11-15 Status: Ready for Testing Recommendation: Run test_retrosheet_arms.py to validate before integration