# Outfield Arm Rating Improvement Proposal ## Executive Summary This document proposes an improved method for calculating outfield arm ratings using Retrosheet play-by-play event data. The current method relies solely on Baseball Reference's `bis_runs_outfield` statistic, which is not available for historical seasons. The proposed method uses multiple metrics from detailed play-by-play data to create a more nuanced and historically-available arm rating system. ## Current Implementation **Location:** `defenders/calcs_defense.py:382-406` **Current Method:** - Uses `bis_runs_outfield` from Baseball Reference defensive stats - Takes the **maximum** value across all OF positions played (LF/CF/RF) - Maps to a -6 to +2 scale based on fixed thresholds - Thresholds calibrated to 2005 data (top value: 23 for Jim Edmonds) **Limitations:** 1. `bis_runs_outfield` not available for all historical seasons 2. Single-metric approach doesn't capture nuance of arm strength 3. Doesn't differentiate between "strong arm with poor positioning" vs "weak arm with good positioning" 4. No adjustment for position-specific expectations (RF arms typically stronger than LF) ## Available Retrosheet Data The `retrosheets_events_2005.csv` file contains detailed play-by-play data with these arm-relevant fields: ### Direct Arm Indicators - **`a7`, `a8`, `a9`**: Assists by LF, CF, RF (when OF throws result in outs) - **`po7`, `po8`, `po9`**: Putouts by LF, CF, RF (catches/fields) - **`brout_b`, `brout1`, `brout2`, `brout3`**: Which baserunners were thrown out (identifies who got the out) ### Context Fields - **`br1_pre/post`, `br2_pre/post`, `br3_pre/post`**: Runner positions before/after play - **`event`**: Play description (e.g., "S7/G5.BX2(74)" = single to LF, batter out at 2nd by LF→2B) - **`loc`**: Hit location (e.g., "7LS+" = LF line drive short) - **`hittype`**: L (line drive), F (fly ball), G (ground ball) ## 2005 Season Analysis Results ### League-Wide Statistics | Position | Total Assists | Thrown Out Home | Batter Extra Base Outs | Avg Assist Rate | Avg Throwout Rate | |----------|--------------|-----------------|----------------------|----------------|-------------------| | LF | 294 | 184 | 102 | 3.01% | 86.71% | | CF | 247 | 167 | 63 | 2.04% | 81.23% | | RF | 288 | 170 | 93 | 2.77% | 79.52% | ### Top Performers (by Total Assists) **Left Field:** 1. tagus001 (7 assists, 8.64% rate, 3 home throws) 2. evera001, piera001 (6 assists each, 100% throwout rate) **Center Field:** 1. liebm001 (8 assists, 2.87% rate, 8 home throws) 2. mathm001 (7 assists, 2.66% rate, 7 home throws) **Right Field:** 1. kenna001, canor001, gathj001 (6 assists each) ### Key Insights - Average assist rate varies by position (LF: 3.01%, CF: 2.04%, RF: 2.77%) - High throwout rates (79-87%) suggest most assists occur on "sure outs" - Home plate throws are rare but high-value (strongest arm indicator) - Batter extra-base outs (preventing runners from stretching singles to doubles) common ## Proposed Arm Rating Formula ### Multi-Metric Composite Score ```python arm_score = ( (assist_rate * 300) + # PRIMARY: Assist rate (dominant) (home_throws * 1.0) + # Quality: home plate throws (batter_extra_outs * 1.0) + # Quality: preventing extra bases (total_assists * 0.1) # Minimal volume bonus ) ``` **Design Philosophy:** - **Assist rate dominates** - Only rate stat needed (assists are already outs) - **Quality bonuses** - Home throws and batter extra outs add minimal context - **No throwout rate** - Redundant since assists = outs by definition - **Volume minimized** - Raw assist count provides minor bonus only - **Example:** 8.64% assist rate = 25.9 points vs 2.87% rate = 8.6 points **Why No Throwout Rate:** - In baseball statistics, an assist is ALREADY an out by definition - "Throwout rate" was measuring relay throws vs direct outs - This created redundancy with assist_rate - Simplified formula focuses on pure assist rate + quality indicators ### Position-Adjusted Z-Score - Calculate league-average and standard deviation by position (LF/CF/RF) - Convert raw score to z-score: `(player_score - position_avg) / position_stddev` - Map z-score to -6 to +2 rating scale ### Minimum Sample Size - Require 50+ balls fielded (putouts + assists) to qualify - Players below threshold get league-average rating (0) ## Proposed Rating Scale Distribution calibrated to actual data after formula adjustments: | Z-Score Range | Arm Rating | Description | Approx % | Examples (2005) | |--------------|------------|-------------|----------|-----------------| | > 2.5 | -6 | Elite cannon | ~1% | Elite assist rates | | 2.0 to 2.5 | -5 | Outstanding | ~2% | Outstanding assist rates | | 1.5 to 2.0 | -4 | Excellent | ~3% | Excellent assist rates | | 1.0 to 1.5 | -3 | Very Good | ~5% | Very good assist rates | | 0.5 to 1.0 | -2 | Above Average | ~15% | Above average assist rates | | 0.0 to 0.5 | -1 | Slightly Above | ~30% | Slightly above average | | -0.5 to 0.0 | 0 | Average | ~40% | Average assist rates | | -0.8 to -0.5 | 1 | Slightly Below | ~20% | Slightly below average | | -1.2 to -0.8 | 2 | Below Average | ~10% | Below average assist rates | | -1.5 to -1.2 | 3 | Poor | ~5% | Poor assist rates | | -1.8 to -1.5 | 4 | Very Poor | ~2% | Very poor assist rates | | < -1.8 | 5 | Very Weak | ~1% | 0 assists / very weak arms | **Distribution Notes:** - **Adjusted for formula**: 300x weight on assist_rate compressed z-score spread - **Thresholds calibrated**: Based on actual 2005 data distribution - **Full range used**: Ensures -6 to +5 scale is fully utilized - **Worst arms**: Players with 0 assists (z ≈ -1.82) receive +5 rating - **Average centered**: Rating 0 represents middle ~40% of qualified OFs ## Implementation Pseudocode ```python def calculate_retrosheet_arm_rating(df_events, player_bbref_id, season_pct=1.0): """ Calculate OF arm rating from Retrosheet play-by-play events Args: df_events: DataFrame of retrosheet events for the season player_bbref_id: Player's baseball-reference ID (key_bbref) season_pct: Percentage of season completed (for proration) Returns: int: Arm rating from -6 to +2 """ # Find all positions this player played arm_ratings_by_pos = [] for of_pos, a_col, po_col, fielder_col in [ ('LF', 'a7', 'po7', 'l7'), ('CF', 'a8', 'po8', 'l8'), ('RF', 'a9', 'po9', 'l9') ]: # Get all plays at this position for this player player_plays = df_events[df_events[fielder_col] == player_bbref_id] if len(player_plays) == 0: continue # Calculate component metrics balls_fielded = player_plays[(player_plays[po_col] > 0) | (player_plays[a_col] > 0)].shape[0] if balls_fielded < 50 * season_pct: # Minimum sample size continue total_assists = player_plays[player_plays[a_col] > 0].shape[0] throwouts = player_plays[ (player_plays[a_col] > 0) & ((player_plays['brout1'] == int(a_col[-1])) | (player_plays['brout2'] == int(a_col[-1])) | (player_plays['brout3'] == int(a_col[-1])) | (player_plays['brout_b'] == int(a_col[-1]))) ].shape[0] home_throws = player_plays[ (player_plays[a_col] > 0) & ((player_plays['brout1'] == int(a_col[-1])) | (player_plays['brout2'] == int(a_col[-1])) | (player_plays['brout3'] == int(a_col[-1]))) ].shape[0] batter_extra_outs = player_plays[ (player_plays[a_col] > 0) & (player_plays['brout_b'] == int(a_col[-1])) ].shape[0] # Calculate rates assist_rate = total_assists / balls_fielded if balls_fielded > 0 else 0 throwout_rate = throwouts / total_assists if total_assists > 0 else 0 # Composite score raw_score = ( (assist_rate * 30) + (throwout_rate * 5) + (home_throws * 2) + (batter_extra_outs * 1.5) + (total_assists * 0.5) ) # Get league stats for this position (pre-calculated) position_avg = get_position_average(of_pos, season_pct) position_std = get_position_stddev(of_pos, season_pct) # Calculate z-score z_score = (raw_score - position_avg) / position_std if position_std > 0 else 0 arm_ratings_by_pos.append(z_score) if not arm_ratings_by_pos: return 0 # Default average rating # Use maximum z-score across positions (best arm showing) max_z = max(arm_ratings_by_pos) # Map to -6 to +5 scale (normal distribution) if max_z > 2.5: return -6 # Elite (top 0.6%) elif max_z > 2.0: return -5 # Outstanding (top 2.3%) elif max_z > 1.5: return -4 # Excellent (top 6.7%) elif max_z > 1.0: return -3 # Very Good (top 16%) elif max_z > 0.5: return -2 # Above Average (top 31%) elif max_z > 0.0: return -1 # Slightly Above (top 50%) elif max_z > -0.5: return 0 # Average (middle 38%) elif max_z > -1.0: return 1 # Slightly Below (bottom 31%) elif max_z > -1.5: return 2 # Below Average (bottom 16%) elif max_z > -2.0: return 3 # Poor (bottom 6.7%) elif max_z > -2.5: return 4 # Very Poor (bottom 2.3%) else: return 5 # Very Weak (bottom 0.6%) ``` ## Advantages of Proposed Method 1. **Historical Availability**: Retrosheet data available from 1921-present 2. **Multi-Dimensional**: Captures different aspects of arm strength 3. **Context-Aware**: Accounts for position-specific expectations 4. **Nuanced**: Distinguishes between volume and quality of throws 5. **Transparent**: Clear formula allows for tuning/debugging ## Disadvantages 1. **Data Processing**: Requires parsing large play-by-play files 2. **Complexity**: More complex than single-stat lookup 3. **Sample Size**: Platoon players may not have enough opportunities 4. **Indirect Measurement**: Measures outcomes, not true arm strength ## Integration with Current System ### Option 1: Hybrid Approach (Recommended) - Use Baseball Reference `bis_runs_outfield` when available (2003+) - Fall back to Retrosheet calculation for historical seasons - Calibrate both scales to produce equivalent ratings ### Option 2: Full Replacement - Always use Retrosheet calculation for consistency - Remove dependency on Baseball Reference defensive stats - Requires one-time validation against known strong/weak arms ## Next Steps 1. **Validate against known data**: Compare 2005 Retrosheet ratings vs Baseball Reference 2. **Tune weights**: Adjust formula weights based on correlation with existing ratings 3. **Calculate league baselines**: Pre-compute position averages/stddev for all seasons 4. **Performance optimization**: Cache calculations, optimize dataframe operations 5. **Integration testing**: Run full season card generation with new method ## Sample Players for Validation Test the formula against these known arm strengths from 2005: **Strong Arms (Should get -4 to -6):** - Jim Edmonds (CF) - Gold Glove, known cannon - Ichiro Suzuki (RF) - Multiple award winner - Carl Crawford (LF) - Defensive specialist **Average Arms (Should get -1 to 0):** - Most regular outfielders **Weak Arms (Should get +1 to +2):** - DHs playing OF occasionally - Aging veterans with diminished tools --- **Created:** 2025-11-15 **Author:** Claude (Jarvis) **Status:** Proposal / Awaiting Review