paper-dynasty-card-creation/docs/HOW_TO_USE_ARM_RATINGS.md
2025-11-23 01:28:33 -06:00

6.6 KiB

How to Use Retrosheet Arm Ratings

Overview

The arm rating system calculates outfield arm strength from Retrosheet play-by-play data. Ratings are saved to CSV files for easy reference in card creation scripts.

File Format

Location: data-output/retrosheet_arm_ratings_YYYY.csv

Columns:

  • player_id: Baseball Reference player ID (key_bbref)
  • position: OF position (LF, CF, or RF)
  • season: Year
  • balls_fielded: Total balls fielded at this position
  • total_assists: Total assists at this position
  • home_throws: Assists that threw out a runner at home
  • batter_extra_outs: Assists preventing batter from taking extra base
  • assist_rate: Assists per ball fielded (0.0-1.0)
  • raw_score: Raw composite arm score
  • z_score: Standardized score vs position average
  • arm_rating: Final rating (-6 to +5)

Generating Arm Ratings

For a Complete Season

python generate_arm_ratings_csv.py \
  --year 2005 \
  --events data-input/retrosheet/retrosheets_events_2005.csv

For a Partial Season (Live Series)

python generate_arm_ratings_csv.py \
  --year 2025 \
  --events data-input/retrosheet/events_2025.csv \
  --season-pct 0.5  # 50% of season completed

Custom Output Directory

python generate_arm_ratings_csv.py \
  --year 2005 \
  --events data-input/retrosheet/events.csv \
  --output-dir custom-output/

Using Arm Ratings in Your Scripts

from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv

# Load pre-calculated ratings
arm_ratings = load_arm_ratings_from_csv(season_year=2005)

# Get rating for a specific player
player_rating = arm_ratings.get('suzui001', 0)  # Returns 0 if not found

# Or use the helper function
from defenders.retrosheet_arm_calculator import get_arm_for_player
player_rating = get_arm_for_player(arm_ratings, 'suzui001', default=0)

Method 2: Calculate On-The-Fly

from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
import pandas as pd

# Load events
df_events = pd.read_csv('data-input/retrosheet/events.csv')

# Calculate ratings
arm_ratings = calculate_of_arms_from_retrosheet(df_events, season_pct=1.0)

# Use ratings
player_rating = arm_ratings.get('suzui001', 0)

Method 3: Load Full DataFrame for Analysis

import pandas as pd

# Load the CSV directly
df_arms = pd.read_csv('data-output/retrosheet_arm_ratings_2005.csv')

# Filter to specific position
rf_arms = df_arms[df_arms['position'] == 'RF']

# Get top 10 arms
elite_arms = df_arms.nsmallest(10, 'arm_rating')

# Lookup specific player
suzuki = df_arms[df_arms['player_id'] == 'suzui001']

Integration with Card Creation

In retrosheet_data.py

from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv

# At the top of your script, after determining the season year
SEASON_YEAR = 2005

# Load arm ratings
try:
    retrosheet_arm_ratings = load_arm_ratings_from_csv(SEASON_YEAR)
    print(f"Loaded arm ratings for {len(retrosheet_arm_ratings)} outfielders")
except FileNotFoundError:
    print(f"Warning: No arm ratings found for {SEASON_YEAR}, using defaults")
    retrosheet_arm_ratings = {}

# Later, when assigning positions in create_positions()
# Replace the current arm_outfield() call with:
from defenders.retrosheet_arm_calculator import get_arm_for_player

arm_rating = get_arm_for_player(
    retrosheet_arm_ratings,
    df_data['key_bbref'],
    default=0  # Average rating if player not found
)

In defenders/calcs_defense.py

If you want to integrate into the main defensive calculation:

# In create_positions(), around line 84:
if df_data["key_bbref"] in df_of.index and len(of_arms) > 0 and len(of_payloads) > 0:
    try:
        # Load arm ratings if available
        if not hasattr(create_positions, 'arm_ratings_cache'):
            from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
            try:
                create_positions.arm_ratings_cache = load_arm_ratings_from_csv(SEASON_YEAR)
            except:
                create_positions.arm_ratings_cache = {}

        # Get arm rating from Retrosheet data
        arm_rating = create_positions.arm_ratings_cache.get(df_data['key_bbref'])

        # Fall back to original calculation if not found
        if arm_rating is None:
            arm_rating = arm_outfield(of_arms)

        # ... rest of the code

Rating Scale Reference

Rating Description Approx % Example Players (2005)
-6 Elite cannon ~1% Best arms in league
-5 Outstanding ~2% Gold Glove caliber
-4 Excellent ~3% Well above average
-3 Very Good ~5% Above average
-2 Above Average ~10% Solid arms
-1 Slightly Above ~15% Decent arms
0 Average ~45% League average
+1 Slightly Below ~10% Below average
+2 Below Average ~5% Weak arms
+3 Poor ~3% Poor arms
+4 Very Poor ~2% Very weak
+5 Very Weak ~1% Worst arms

Troubleshooting

File Not Found Error

FileNotFoundError: Arm ratings file not found: data-output/retrosheet_arm_ratings_2005.csv

Solution: Generate the CSV file first:

python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events.csv

Player Not in Ratings

If a player doesn't have a rating, it means:

  1. They didn't play OF in that season
  2. They didn't meet minimum sample size (50 balls fielded)
  3. They played OF but only in games not in your Retrosheet data

Solution: Use the default=0 parameter to assign average rating.

Multiple Positions

If a player played multiple OF positions (e.g., both RF and CF), the CSV will have separate rows. When loading with load_arm_ratings_from_csv(), it automatically uses the best (lowest) rating across all positions.

Maintenance

Regenerating Ratings

If you update the arm rating formula or thresholds, regenerate the CSV files:

# Regenerate for one year
python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events_2005.csv

# Regenerate for multiple years
for year in 2005 2006 2007; do
  python generate_arm_ratings_csv.py --year $year --events data-input/retrosheet/events_$year.csv
done

Version Control

Recommended: Commit the CSV files to git so arm ratings are consistent across runs and deployments.

Add to .gitignore if you want to regenerate each time:

data-output/retrosheet_arm_ratings_*.csv

Last Updated: 2025-11-15 Formula Version: 300x assist_rate, no throwout_rate, calibrated z-score thresholds