12 KiB
Outfield Arm Rating Improvement Proposal
Executive Summary
This document proposes an improved method for calculating outfield arm ratings using Retrosheet play-by-play event data. The current method relies solely on Baseball Reference's bis_runs_outfield statistic, which is not available for historical seasons. The proposed method uses multiple metrics from detailed play-by-play data to create a more nuanced and historically-available arm rating system.
Current Implementation
Location: defenders/calcs_defense.py:382-406
Current Method:
- Uses
bis_runs_outfieldfrom Baseball Reference defensive stats - Takes the maximum value across all OF positions played (LF/CF/RF)
- Maps to a -6 to +2 scale based on fixed thresholds
- Thresholds calibrated to 2005 data (top value: 23 for Jim Edmonds)
Limitations:
bis_runs_outfieldnot available for all historical seasons- Single-metric approach doesn't capture nuance of arm strength
- Doesn't differentiate between "strong arm with poor positioning" vs "weak arm with good positioning"
- No adjustment for position-specific expectations (RF arms typically stronger than LF)
Available Retrosheet Data
The retrosheets_events_2005.csv file contains detailed play-by-play data with these arm-relevant fields:
Direct Arm Indicators
a7,a8,a9: Assists by LF, CF, RF (when OF throws result in outs)po7,po8,po9: Putouts by LF, CF, RF (catches/fields)brout_b,brout1,brout2,brout3: Which baserunners were thrown out (identifies who got the out)
Context Fields
br1_pre/post,br2_pre/post,br3_pre/post: Runner positions before/after playevent: Play description (e.g., "S7/G5.BX2(74)" = single to LF, batter out at 2nd by LF→2B)loc: Hit location (e.g., "7LS+" = LF line drive short)hittype: L (line drive), F (fly ball), G (ground ball)
2005 Season Analysis Results
League-Wide Statistics
| Position | Total Assists | Thrown Out Home | Batter Extra Base Outs | Avg Assist Rate | Avg Throwout Rate |
|---|---|---|---|---|---|
| LF | 294 | 184 | 102 | 3.01% | 86.71% |
| CF | 247 | 167 | 63 | 2.04% | 81.23% |
| RF | 288 | 170 | 93 | 2.77% | 79.52% |
Top Performers (by Total Assists)
Left Field:
- tagus001 (7 assists, 8.64% rate, 3 home throws)
- evera001, piera001 (6 assists each, 100% throwout rate)
Center Field:
- liebm001 (8 assists, 2.87% rate, 8 home throws)
- mathm001 (7 assists, 2.66% rate, 7 home throws)
Right Field:
- kenna001, canor001, gathj001 (6 assists each)
Key Insights
- Average assist rate varies by position (LF: 3.01%, CF: 2.04%, RF: 2.77%)
- High throwout rates (79-87%) suggest most assists occur on "sure outs"
- Home plate throws are rare but high-value (strongest arm indicator)
- Batter extra-base outs (preventing runners from stretching singles to doubles) common
Proposed Arm Rating Formula
Multi-Metric Composite Score
arm_score = (
(assist_rate * 300) + # PRIMARY: Assist rate (dominant)
(home_throws * 1.0) + # Quality: home plate throws
(batter_extra_outs * 1.0) + # Quality: preventing extra bases
(total_assists * 0.1) # Minimal volume bonus
)
Design Philosophy:
- Assist rate dominates - Only rate stat needed (assists are already outs)
- Quality bonuses - Home throws and batter extra outs add minimal context
- No throwout rate - Redundant since assists = outs by definition
- Volume minimized - Raw assist count provides minor bonus only
- Example: 8.64% assist rate = 25.9 points vs 2.87% rate = 8.6 points
Why No Throwout Rate:
- In baseball statistics, an assist is ALREADY an out by definition
- "Throwout rate" was measuring relay throws vs direct outs
- This created redundancy with assist_rate
- Simplified formula focuses on pure assist rate + quality indicators
Position-Adjusted Z-Score
- Calculate league-average and standard deviation by position (LF/CF/RF)
- Convert raw score to z-score:
(player_score - position_avg) / position_stddev - Map z-score to -6 to +2 rating scale
Minimum Sample Size
- Require 50+ balls fielded (putouts + assists) to qualify
- Players below threshold get league-average rating (0)
Proposed Rating Scale
Distribution calibrated to actual data after formula adjustments:
| Z-Score Range | Arm Rating | Description | Approx % | Examples (2005) |
|---|---|---|---|---|
| > 2.5 | -6 | Elite cannon | ~1% | Elite assist rates |
| 2.0 to 2.5 | -5 | Outstanding | ~2% | Outstanding assist rates |
| 1.5 to 2.0 | -4 | Excellent | ~3% | Excellent assist rates |
| 1.0 to 1.5 | -3 | Very Good | ~5% | Very good assist rates |
| 0.5 to 1.0 | -2 | Above Average | ~15% | Above average assist rates |
| 0.0 to 0.5 | -1 | Slightly Above | ~30% | Slightly above average |
| -0.5 to 0.0 | 0 | Average | ~40% | Average assist rates |
| -0.8 to -0.5 | 1 | Slightly Below | ~20% | Slightly below average |
| -1.2 to -0.8 | 2 | Below Average | ~10% | Below average assist rates |
| -1.5 to -1.2 | 3 | Poor | ~5% | Poor assist rates |
| -1.8 to -1.5 | 4 | Very Poor | ~2% | Very poor assist rates |
| < -1.8 | 5 | Very Weak | ~1% | 0 assists / very weak arms |
Distribution Notes:
- Adjusted for formula: 300x weight on assist_rate compressed z-score spread
- Thresholds calibrated: Based on actual 2005 data distribution
- Full range used: Ensures -6 to +5 scale is fully utilized
- Worst arms: Players with 0 assists (z ≈ -1.82) receive +5 rating
- Average centered: Rating 0 represents middle ~40% of qualified OFs
Implementation Pseudocode
def calculate_retrosheet_arm_rating(df_events, player_bbref_id, season_pct=1.0):
"""
Calculate OF arm rating from Retrosheet play-by-play events
Args:
df_events: DataFrame of retrosheet events for the season
player_bbref_id: Player's baseball-reference ID (key_bbref)
season_pct: Percentage of season completed (for proration)
Returns:
int: Arm rating from -6 to +2
"""
# Find all positions this player played
arm_ratings_by_pos = []
for of_pos, a_col, po_col, fielder_col in [
('LF', 'a7', 'po7', 'l7'),
('CF', 'a8', 'po8', 'l8'),
('RF', 'a9', 'po9', 'l9')
]:
# Get all plays at this position for this player
player_plays = df_events[df_events[fielder_col] == player_bbref_id]
if len(player_plays) == 0:
continue
# Calculate component metrics
balls_fielded = player_plays[(player_plays[po_col] > 0) | (player_plays[a_col] > 0)].shape[0]
if balls_fielded < 50 * season_pct: # Minimum sample size
continue
total_assists = player_plays[player_plays[a_col] > 0].shape[0]
throwouts = player_plays[
(player_plays[a_col] > 0) &
((player_plays['brout1'] == int(a_col[-1])) |
(player_plays['brout2'] == int(a_col[-1])) |
(player_plays['brout3'] == int(a_col[-1])) |
(player_plays['brout_b'] == int(a_col[-1])))
].shape[0]
home_throws = player_plays[
(player_plays[a_col] > 0) &
((player_plays['brout1'] == int(a_col[-1])) |
(player_plays['brout2'] == int(a_col[-1])) |
(player_plays['brout3'] == int(a_col[-1])))
].shape[0]
batter_extra_outs = player_plays[
(player_plays[a_col] > 0) &
(player_plays['brout_b'] == int(a_col[-1]))
].shape[0]
# Calculate rates
assist_rate = total_assists / balls_fielded if balls_fielded > 0 else 0
throwout_rate = throwouts / total_assists if total_assists > 0 else 0
# Composite score
raw_score = (
(assist_rate * 30) +
(throwout_rate * 5) +
(home_throws * 2) +
(batter_extra_outs * 1.5) +
(total_assists * 0.5)
)
# Get league stats for this position (pre-calculated)
position_avg = get_position_average(of_pos, season_pct)
position_std = get_position_stddev(of_pos, season_pct)
# Calculate z-score
z_score = (raw_score - position_avg) / position_std if position_std > 0 else 0
arm_ratings_by_pos.append(z_score)
if not arm_ratings_by_pos:
return 0 # Default average rating
# Use maximum z-score across positions (best arm showing)
max_z = max(arm_ratings_by_pos)
# Map to -6 to +5 scale (normal distribution)
if max_z > 2.5:
return -6 # Elite (top 0.6%)
elif max_z > 2.0:
return -5 # Outstanding (top 2.3%)
elif max_z > 1.5:
return -4 # Excellent (top 6.7%)
elif max_z > 1.0:
return -3 # Very Good (top 16%)
elif max_z > 0.5:
return -2 # Above Average (top 31%)
elif max_z > 0.0:
return -1 # Slightly Above (top 50%)
elif max_z > -0.5:
return 0 # Average (middle 38%)
elif max_z > -1.0:
return 1 # Slightly Below (bottom 31%)
elif max_z > -1.5:
return 2 # Below Average (bottom 16%)
elif max_z > -2.0:
return 3 # Poor (bottom 6.7%)
elif max_z > -2.5:
return 4 # Very Poor (bottom 2.3%)
else:
return 5 # Very Weak (bottom 0.6%)
Advantages of Proposed Method
- Historical Availability: Retrosheet data available from 1921-present
- Multi-Dimensional: Captures different aspects of arm strength
- Context-Aware: Accounts for position-specific expectations
- Nuanced: Distinguishes between volume and quality of throws
- Transparent: Clear formula allows for tuning/debugging
Disadvantages
- Data Processing: Requires parsing large play-by-play files
- Complexity: More complex than single-stat lookup
- Sample Size: Platoon players may not have enough opportunities
- Indirect Measurement: Measures outcomes, not true arm strength
Integration with Current System
Option 1: Hybrid Approach (Recommended)
- Use Baseball Reference
bis_runs_outfieldwhen available (2003+) - Fall back to Retrosheet calculation for historical seasons
- Calibrate both scales to produce equivalent ratings
Option 2: Full Replacement
- Always use Retrosheet calculation for consistency
- Remove dependency on Baseball Reference defensive stats
- Requires one-time validation against known strong/weak arms
Next Steps
- Validate against known data: Compare 2005 Retrosheet ratings vs Baseball Reference
- Tune weights: Adjust formula weights based on correlation with existing ratings
- Calculate league baselines: Pre-compute position averages/stddev for all seasons
- Performance optimization: Cache calculations, optimize dataframe operations
- Integration testing: Run full season card generation with new method
Sample Players for Validation
Test the formula against these known arm strengths from 2005:
Strong Arms (Should get -4 to -6):
- Jim Edmonds (CF) - Gold Glove, known cannon
- Ichiro Suzuki (RF) - Multiple award winner
- Carl Crawford (LF) - Defensive specialist
Average Arms (Should get -1 to 0):
- Most regular outfielders
Weak Arms (Should get +1 to +2):
- DHs playing OF occasionally
- Aging veterans with diminished tools
Created: 2025-11-15 Author: Claude (Jarvis) Status: Proposal / Awaiting Review