Cal Corum 4be418d6f0 Card Creation Automation

2025-11-23 01:28:33 -06:00

8.0 KiB

Raw Blame History

Retrosheet Outfield Arm Rating - Implementation Summary

What I've Created

I've analyzed your Retrosheet play-by-play data and created a comprehensive system to calculate outfield arm ratings from historical game events. This gives you an alternative to Baseball Reference's bis_runs_outfield statistic that works for all historical seasons.

Files Created

1. Proposal Document

Location: docs/of_arm_rating_improvement_proposal.md

Comprehensive proposal explaining:

Current limitations of the bis_runs_outfield approach
Available data in Retrosheet events
Statistical analysis of 2005 season
Proposed multi-metric composite formula
Comparison of advantages/disadvantages
Integration recommendations

2. Implementation Module

Location: defenders/retrosheet_arm_calculator.py

Production-ready Python module with:

calculate_of_arms_from_retrosheet() - Main entry point for batch calculation
calculate_player_arm_rating() - Calculate rating for individual player
calculate_position_baselines() - Position-adjusted normalization
Detailed logging and documentation

3. Validation Script

Location: test_retrosheet_arms.py

Testing/validation tool that:

Calculates ratings for all 2005 outfielders
Shows distribution of ratings
Identifies top 20 and bottom 10 arms
Validates against known strong arms (Ichiro, Guerrero, etc.)
Generates detailed statistical reports

Key Findings from 2005 Analysis

Available Metrics from Retrosheet

From the play-by-play data, we can extract:

Total Assists - OF threw out a runner
Home Throws - Threw out runner at home (strongest arm indicator)
Batter Extra-Base Outs - Threw out batter trying to stretch (prevents doubles)
Assist Rate - Assists per balls fielded (opportunity-adjusted)
Throwout Rate - Success when attempting throw

2005 League Statistics

Position	Avg Assist Rate	Avg Throwout Rate	Total Assists
LF	3.01%	86.71%	294
CF	2.04%	81.23%	247
RF	2.77%	79.52%	288

Key Insight: Assist rates and success rates vary by position, so we use position-adjusted z-scores.

How the Formula Works

Composite Score (Simplified Rate-Dominant Formula)

raw_score = (
    (assist_rate * 300) +        # PRIMARY: Assist rate (dominant factor)
    (home_throws * 1.0) +        # Quality: home plate throws
    (batter_extra_outs * 1.0) +  # Quality: preventing extra bases
    (total_assists * 0.1)        # Minimal volume bonus
)

Philosophy: Assist rate is the dominant driver. Assists are already outs by definition, so no separate "throwout rate" is needed. Quality indicators (home throws, batter extra outs) provide minimal context about the types of plays made.

Elite assist rates (8%+) contribute 24+ points vs average rates (3%) contribute ~9 points.

Position-Adjusted Rating

Calculate league average and standard deviation for LF/CF/RF
Convert player's raw score to z-score: (score - avg) / stddev
Map z-score to -6 to +5 rating scale (normal distribution)

Rating Scale (Calibrated Distribution)

Z-Score	Rating	Description	Approx %
> 2.5	-6	Elite cannon	~1%
2.0-2.5	-5	Outstanding	~2%
1.5-2.0	-4	Excellent	~3%
1.0-1.5	-3	Very Good	~5%
0.5-1.0	-2	Above Average	~15%
0.0-0.5	-1	Slightly Above	~30%
-0.5-0.0	0	Average	~40%
-0.8--0.5	1	Slightly Below	~20%
-1.2--0.8	2	Below Average	~10%
-1.5--1.2	3	Poor	~5%
-1.8--1.5	4	Very Poor	~2%
< -1.8	5	Very Weak	~1%

Note: Thresholds adjusted after 300x assist_rate weight compressed z-score spread.

Testing the Implementation

Step 1: Run the validation script

python test_retrosheet_arms.py

This will:

Calculate ratings for all 2005 outfielders
Show distribution (how many players at each rating)
Identify elite vs weak arms
Validate against known strong arms

Step 2: Review the output

Look for:

Elite arms (rating ≤ -3): Should be players known for strong arms
Distribution: Should be bell curve centered around 0
Position differences: CF may have more volume but RF/LF may have stronger arms

Step 3: Compare to current method

For players with both bis_runs_outfield (from Baseball Reference) and Retrosheet data:

Do the ratings correlate?
Where do they differ and why?
Which seems more accurate to your domain knowledge?

Integration Options

Option 1: Hybrid (Recommended for Development)

Use Baseball Reference when available, Retrosheet as fallback:

# In defenders/calcs_defense.py, around line 71-84
if 'bis_runs_outfield' in pos_df.columns:
    # Current method - use BIS runs
    of_arms.append(int(pos_data[0].at[df_data["key_bbref"], 'bis_runs_outfield']))
else:
    # Fallback - use Retrosheet calculation
    if not hasattr(self, 'retrosheet_arms'):
        from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
        self.retrosheet_arms = calculate_of_arms_from_retrosheet(df_events, season_pct)

    # Get arm rating from Retrosheet
    from defenders.retrosheet_arm_calculator import get_arm_for_player
    arm_rating = get_arm_for_player(self.retrosheet_arms, df_data['key_bbref'])
    return arm_rating  # Skip the arm_outfield() call

Option 2: Full Replacement

Always use Retrosheet for consistency:

# In retrosheet_data.py, after loading events
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet

df_events = pd.read_csv(EVENTS_FILENAME)
retrosheet_arm_ratings = calculate_of_arms_from_retrosheet(df_events, SEASON_PCT)

# Then in create_positions() call:
from defenders.retrosheet_arm_calculator import get_arm_for_player
arm_rating = get_arm_for_player(retrosheet_arm_ratings, df_data['key_bbref'])

Sample Size Requirements

Minimum: 50 balls fielded (putouts + assists) per position
Full season: Most regulars will qualify (200+ balls)
Partial season: Adjust with season_pct parameter
Platoon players: May not qualify; get default rating of 0 (average)

Why This Approach is Better

Advantages

Historical Coverage - Works for any season with Retrosheet data (1921+)
Multi-Dimensional - Considers quality and quantity of throws
Position-Adjusted - Accounts for different expectations by position
Transparent - Formula is clear and can be tuned
Context-Aware - Weights high-value plays (home throws) more heavily

Disadvantages

Processing Overhead - Must parse large play-by-play files
Sample Size - Platoon players may not qualify
Indirect - Measures outcomes, not raw arm strength
One-Time Work - Need to calculate baselines for each season

Next Steps

Run validation script to see 2005 results
Review elite arms - Do they match your expectations?
Choose integration approach (hybrid vs full replacement)
Test on a small cardset before full deployment
Tune weights if needed based on validation results

Questions to Consider

Do the elite arms (rating -4 to -6) match players you know had strong arms?
Are there players with unexpectedly high/low ratings? Why?
How does this compare to the bis_runs_outfield method for 2005?
Should home throws be weighted even more heavily?
Should we adjust thresholds to get more granular ratings?

Support

The implementation includes extensive logging. Set logging level to DEBUG to see:

Individual player calculations
Raw scores and z-scores
Position baselines
Sample size warnings

import logging
logging.getLogger('exceptions').setLevel(logging.DEBUG)

Created: 2025-11-15 Status: Ready for Testing Recommendation: Run test_retrosheet_arms.py to validate before integration

8.0 KiB Raw Blame History