paper-dynasty-card-creation/docs/of_arm_rating_improvement_proposal.md
2025-11-23 01:28:33 -06:00

12 KiB

Outfield Arm Rating Improvement Proposal

Executive Summary

This document proposes an improved method for calculating outfield arm ratings using Retrosheet play-by-play event data. The current method relies solely on Baseball Reference's bis_runs_outfield statistic, which is not available for historical seasons. The proposed method uses multiple metrics from detailed play-by-play data to create a more nuanced and historically-available arm rating system.

Current Implementation

Location: defenders/calcs_defense.py:382-406

Current Method:

  • Uses bis_runs_outfield from Baseball Reference defensive stats
  • Takes the maximum value across all OF positions played (LF/CF/RF)
  • Maps to a -6 to +2 scale based on fixed thresholds
  • Thresholds calibrated to 2005 data (top value: 23 for Jim Edmonds)

Limitations:

  1. bis_runs_outfield not available for all historical seasons
  2. Single-metric approach doesn't capture nuance of arm strength
  3. Doesn't differentiate between "strong arm with poor positioning" vs "weak arm with good positioning"
  4. No adjustment for position-specific expectations (RF arms typically stronger than LF)

Available Retrosheet Data

The retrosheets_events_2005.csv file contains detailed play-by-play data with these arm-relevant fields:

Direct Arm Indicators

  • a7, a8, a9: Assists by LF, CF, RF (when OF throws result in outs)
  • po7, po8, po9: Putouts by LF, CF, RF (catches/fields)
  • brout_b, brout1, brout2, brout3: Which baserunners were thrown out (identifies who got the out)

Context Fields

  • br1_pre/post, br2_pre/post, br3_pre/post: Runner positions before/after play
  • event: Play description (e.g., "S7/G5.BX2(74)" = single to LF, batter out at 2nd by LF→2B)
  • loc: Hit location (e.g., "7LS+" = LF line drive short)
  • hittype: L (line drive), F (fly ball), G (ground ball)

2005 Season Analysis Results

League-Wide Statistics

Position Total Assists Thrown Out Home Batter Extra Base Outs Avg Assist Rate Avg Throwout Rate
LF 294 184 102 3.01% 86.71%
CF 247 167 63 2.04% 81.23%
RF 288 170 93 2.77% 79.52%

Top Performers (by Total Assists)

Left Field:

  1. tagus001 (7 assists, 8.64% rate, 3 home throws)
  2. evera001, piera001 (6 assists each, 100% throwout rate)

Center Field:

  1. liebm001 (8 assists, 2.87% rate, 8 home throws)
  2. mathm001 (7 assists, 2.66% rate, 7 home throws)

Right Field:

  1. kenna001, canor001, gathj001 (6 assists each)

Key Insights

  • Average assist rate varies by position (LF: 3.01%, CF: 2.04%, RF: 2.77%)
  • High throwout rates (79-87%) suggest most assists occur on "sure outs"
  • Home plate throws are rare but high-value (strongest arm indicator)
  • Batter extra-base outs (preventing runners from stretching singles to doubles) common

Proposed Arm Rating Formula

Multi-Metric Composite Score

arm_score = (
    (assist_rate * 300) +                   # PRIMARY: Assist rate (dominant)
    (home_throws * 1.0) +                   # Quality: home plate throws
    (batter_extra_outs * 1.0) +            # Quality: preventing extra bases
    (total_assists * 0.1)                   # Minimal volume bonus
)

Design Philosophy:

  • Assist rate dominates - Only rate stat needed (assists are already outs)
  • Quality bonuses - Home throws and batter extra outs add minimal context
  • No throwout rate - Redundant since assists = outs by definition
  • Volume minimized - Raw assist count provides minor bonus only
  • Example: 8.64% assist rate = 25.9 points vs 2.87% rate = 8.6 points

Why No Throwout Rate:

  • In baseball statistics, an assist is ALREADY an out by definition
  • "Throwout rate" was measuring relay throws vs direct outs
  • This created redundancy with assist_rate
  • Simplified formula focuses on pure assist rate + quality indicators

Position-Adjusted Z-Score

  • Calculate league-average and standard deviation by position (LF/CF/RF)
  • Convert raw score to z-score: (player_score - position_avg) / position_stddev
  • Map z-score to -6 to +2 rating scale

Minimum Sample Size

  • Require 50+ balls fielded (putouts + assists) to qualify
  • Players below threshold get league-average rating (0)

Proposed Rating Scale

Distribution calibrated to actual data after formula adjustments:

Z-Score Range Arm Rating Description Approx % Examples (2005)
> 2.5 -6 Elite cannon ~1% Elite assist rates
2.0 to 2.5 -5 Outstanding ~2% Outstanding assist rates
1.5 to 2.0 -4 Excellent ~3% Excellent assist rates
1.0 to 1.5 -3 Very Good ~5% Very good assist rates
0.5 to 1.0 -2 Above Average ~15% Above average assist rates
0.0 to 0.5 -1 Slightly Above ~30% Slightly above average
-0.5 to 0.0 0 Average ~40% Average assist rates
-0.8 to -0.5 1 Slightly Below ~20% Slightly below average
-1.2 to -0.8 2 Below Average ~10% Below average assist rates
-1.5 to -1.2 3 Poor ~5% Poor assist rates
-1.8 to -1.5 4 Very Poor ~2% Very poor assist rates
< -1.8 5 Very Weak ~1% 0 assists / very weak arms

Distribution Notes:

  • Adjusted for formula: 300x weight on assist_rate compressed z-score spread
  • Thresholds calibrated: Based on actual 2005 data distribution
  • Full range used: Ensures -6 to +5 scale is fully utilized
  • Worst arms: Players with 0 assists (z ≈ -1.82) receive +5 rating
  • Average centered: Rating 0 represents middle ~40% of qualified OFs

Implementation Pseudocode

def calculate_retrosheet_arm_rating(df_events, player_bbref_id, season_pct=1.0):
    """
    Calculate OF arm rating from Retrosheet play-by-play events

    Args:
        df_events: DataFrame of retrosheet events for the season
        player_bbref_id: Player's baseball-reference ID (key_bbref)
        season_pct: Percentage of season completed (for proration)

    Returns:
        int: Arm rating from -6 to +2
    """

    # Find all positions this player played
    arm_ratings_by_pos = []

    for of_pos, a_col, po_col, fielder_col in [
        ('LF', 'a7', 'po7', 'l7'),
        ('CF', 'a8', 'po8', 'l8'),
        ('RF', 'a9', 'po9', 'l9')
    ]:
        # Get all plays at this position for this player
        player_plays = df_events[df_events[fielder_col] == player_bbref_id]

        if len(player_plays) == 0:
            continue

        # Calculate component metrics
        balls_fielded = player_plays[(player_plays[po_col] > 0) | (player_plays[a_col] > 0)].shape[0]

        if balls_fielded < 50 * season_pct:  # Minimum sample size
            continue

        total_assists = player_plays[player_plays[a_col] > 0].shape[0]

        throwouts = player_plays[
            (player_plays[a_col] > 0) &
            ((player_plays['brout1'] == int(a_col[-1])) |
             (player_plays['brout2'] == int(a_col[-1])) |
             (player_plays['brout3'] == int(a_col[-1])) |
             (player_plays['brout_b'] == int(a_col[-1])))
        ].shape[0]

        home_throws = player_plays[
            (player_plays[a_col] > 0) &
            ((player_plays['brout1'] == int(a_col[-1])) |
             (player_plays['brout2'] == int(a_col[-1])) |
             (player_plays['brout3'] == int(a_col[-1])))
        ].shape[0]

        batter_extra_outs = player_plays[
            (player_plays[a_col] > 0) &
            (player_plays['brout_b'] == int(a_col[-1]))
        ].shape[0]

        # Calculate rates
        assist_rate = total_assists / balls_fielded if balls_fielded > 0 else 0
        throwout_rate = throwouts / total_assists if total_assists > 0 else 0

        # Composite score
        raw_score = (
            (assist_rate * 30) +
            (throwout_rate * 5) +
            (home_throws * 2) +
            (batter_extra_outs * 1.5) +
            (total_assists * 0.5)
        )

        # Get league stats for this position (pre-calculated)
        position_avg = get_position_average(of_pos, season_pct)
        position_std = get_position_stddev(of_pos, season_pct)

        # Calculate z-score
        z_score = (raw_score - position_avg) / position_std if position_std > 0 else 0

        arm_ratings_by_pos.append(z_score)

    if not arm_ratings_by_pos:
        return 0  # Default average rating

    # Use maximum z-score across positions (best arm showing)
    max_z = max(arm_ratings_by_pos)

    # Map to -6 to +5 scale (normal distribution)
    if max_z > 2.5:
        return -6    # Elite (top 0.6%)
    elif max_z > 2.0:
        return -5    # Outstanding (top 2.3%)
    elif max_z > 1.5:
        return -4    # Excellent (top 6.7%)
    elif max_z > 1.0:
        return -3    # Very Good (top 16%)
    elif max_z > 0.5:
        return -2    # Above Average (top 31%)
    elif max_z > 0.0:
        return -1    # Slightly Above (top 50%)
    elif max_z > -0.5:
        return 0     # Average (middle 38%)
    elif max_z > -1.0:
        return 1     # Slightly Below (bottom 31%)
    elif max_z > -1.5:
        return 2     # Below Average (bottom 16%)
    elif max_z > -2.0:
        return 3     # Poor (bottom 6.7%)
    elif max_z > -2.5:
        return 4     # Very Poor (bottom 2.3%)
    else:
        return 5     # Very Weak (bottom 0.6%)

Advantages of Proposed Method

  1. Historical Availability: Retrosheet data available from 1921-present
  2. Multi-Dimensional: Captures different aspects of arm strength
  3. Context-Aware: Accounts for position-specific expectations
  4. Nuanced: Distinguishes between volume and quality of throws
  5. Transparent: Clear formula allows for tuning/debugging

Disadvantages

  1. Data Processing: Requires parsing large play-by-play files
  2. Complexity: More complex than single-stat lookup
  3. Sample Size: Platoon players may not have enough opportunities
  4. Indirect Measurement: Measures outcomes, not true arm strength

Integration with Current System

  • Use Baseball Reference bis_runs_outfield when available (2003+)
  • Fall back to Retrosheet calculation for historical seasons
  • Calibrate both scales to produce equivalent ratings

Option 2: Full Replacement

  • Always use Retrosheet calculation for consistency
  • Remove dependency on Baseball Reference defensive stats
  • Requires one-time validation against known strong/weak arms

Next Steps

  1. Validate against known data: Compare 2005 Retrosheet ratings vs Baseball Reference
  2. Tune weights: Adjust formula weights based on correlation with existing ratings
  3. Calculate league baselines: Pre-compute position averages/stddev for all seasons
  4. Performance optimization: Cache calculations, optimize dataframe operations
  5. Integration testing: Run full season card generation with new method

Sample Players for Validation

Test the formula against these known arm strengths from 2005:

Strong Arms (Should get -4 to -6):

  • Jim Edmonds (CF) - Gold Glove, known cannon
  • Ichiro Suzuki (RF) - Multiple award winner
  • Carl Crawford (LF) - Defensive specialist

Average Arms (Should get -1 to 0):

  • Most regular outfielders

Weak Arms (Should get +1 to +2):

  • DHs playing OF occasionally
  • Aging veterans with diminished tools

Created: 2025-11-15 Author: Claude (Jarvis) Status: Proposal / Awaiting Review