# How to Use Retrosheet Arm Ratings ## Overview The arm rating system calculates outfield arm strength from Retrosheet play-by-play data. Ratings are saved to CSV files for easy reference in card creation scripts. ## File Format **Location:** `data-output/retrosheet_arm_ratings_YYYY.csv` **Columns:** - `player_id`: Baseball Reference player ID (key_bbref) - `position`: OF position (LF, CF, or RF) - `season`: Year - `balls_fielded`: Total balls fielded at this position - `total_assists`: Total assists at this position - `home_throws`: Assists that threw out a runner at home - `batter_extra_outs`: Assists preventing batter from taking extra base - `assist_rate`: Assists per ball fielded (0.0-1.0) - `raw_score`: Raw composite arm score - `z_score`: Standardized score vs position average - `arm_rating`: Final rating (-6 to +5) ## Generating Arm Ratings ### For a Complete Season ```bash python generate_arm_ratings_csv.py \ --year 2005 \ --events data-input/retrosheet/retrosheets_events_2005.csv ``` ### For a Partial Season (Live Series) ```bash python generate_arm_ratings_csv.py \ --year 2025 \ --events data-input/retrosheet/events_2025.csv \ --season-pct 0.5 # 50% of season completed ``` ### Custom Output Directory ```bash python generate_arm_ratings_csv.py \ --year 2005 \ --events data-input/retrosheet/events.csv \ --output-dir custom-output/ ``` ## Using Arm Ratings in Your Scripts ### Method 1: Load from CSV (Recommended) ```python from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv # Load pre-calculated ratings arm_ratings = load_arm_ratings_from_csv(season_year=2005) # Get rating for a specific player player_rating = arm_ratings.get('suzui001', 0) # Returns 0 if not found # Or use the helper function from defenders.retrosheet_arm_calculator import get_arm_for_player player_rating = get_arm_for_player(arm_ratings, 'suzui001', default=0) ``` ### Method 2: Calculate On-The-Fly ```python from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet import pandas as pd # Load events df_events = pd.read_csv('data-input/retrosheet/events.csv') # Calculate ratings arm_ratings = calculate_of_arms_from_retrosheet(df_events, season_pct=1.0) # Use ratings player_rating = arm_ratings.get('suzui001', 0) ``` ### Method 3: Load Full DataFrame for Analysis ```python import pandas as pd # Load the CSV directly df_arms = pd.read_csv('data-output/retrosheet_arm_ratings_2005.csv') # Filter to specific position rf_arms = df_arms[df_arms['position'] == 'RF'] # Get top 10 arms elite_arms = df_arms.nsmallest(10, 'arm_rating') # Lookup specific player suzuki = df_arms[df_arms['player_id'] == 'suzui001'] ``` ## Integration with Card Creation ### In retrosheet_data.py ```python from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv # At the top of your script, after determining the season year SEASON_YEAR = 2005 # Load arm ratings try: retrosheet_arm_ratings = load_arm_ratings_from_csv(SEASON_YEAR) print(f"Loaded arm ratings for {len(retrosheet_arm_ratings)} outfielders") except FileNotFoundError: print(f"Warning: No arm ratings found for {SEASON_YEAR}, using defaults") retrosheet_arm_ratings = {} # Later, when assigning positions in create_positions() # Replace the current arm_outfield() call with: from defenders.retrosheet_arm_calculator import get_arm_for_player arm_rating = get_arm_for_player( retrosheet_arm_ratings, df_data['key_bbref'], default=0 # Average rating if player not found ) ``` ### In defenders/calcs_defense.py If you want to integrate into the main defensive calculation: ```python # In create_positions(), around line 84: if df_data["key_bbref"] in df_of.index and len(of_arms) > 0 and len(of_payloads) > 0: try: # Load arm ratings if available if not hasattr(create_positions, 'arm_ratings_cache'): from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv try: create_positions.arm_ratings_cache = load_arm_ratings_from_csv(SEASON_YEAR) except: create_positions.arm_ratings_cache = {} # Get arm rating from Retrosheet data arm_rating = create_positions.arm_ratings_cache.get(df_data['key_bbref']) # Fall back to original calculation if not found if arm_rating is None: arm_rating = arm_outfield(of_arms) # ... rest of the code ``` ## Rating Scale Reference | Rating | Description | Approx % | Example Players (2005) | |--------|-------------|----------|----------------------| | -6 | Elite cannon | ~1% | Best arms in league | | -5 | Outstanding | ~2% | Gold Glove caliber | | -4 | Excellent | ~3% | Well above average | | -3 | Very Good | ~5% | Above average | | -2 | Above Average | ~10% | Solid arms | | -1 | Slightly Above | ~15% | Decent arms | | 0 | Average | ~45% | League average | | +1 | Slightly Below | ~10% | Below average | | +2 | Below Average | ~5% | Weak arms | | +3 | Poor | ~3% | Poor arms | | +4 | Very Poor | ~2% | Very weak | | +5 | Very Weak | ~1% | Worst arms | ## Troubleshooting ### File Not Found Error ```python FileNotFoundError: Arm ratings file not found: data-output/retrosheet_arm_ratings_2005.csv ``` **Solution:** Generate the CSV file first: ```bash python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events.csv ``` ### Player Not in Ratings If a player doesn't have a rating, it means: 1. They didn't play OF in that season 2. They didn't meet minimum sample size (50 balls fielded) 3. They played OF but only in games not in your Retrosheet data **Solution:** Use the `default=0` parameter to assign average rating. ### Multiple Positions If a player played multiple OF positions (e.g., both RF and CF), the CSV will have separate rows. When loading with `load_arm_ratings_from_csv()`, it automatically uses the **best (lowest) rating** across all positions. ## Maintenance ### Regenerating Ratings If you update the arm rating formula or thresholds, regenerate the CSV files: ```bash # Regenerate for one year python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events_2005.csv # Regenerate for multiple years for year in 2005 2006 2007; do python generate_arm_ratings_csv.py --year $year --events data-input/retrosheet/events_$year.csv done ``` ### Version Control **Recommended:** Commit the CSV files to git so arm ratings are consistent across runs and deployments. Add to `.gitignore` if you want to regenerate each time: ``` data-output/retrosheet_arm_ratings_*.csv ``` --- **Last Updated:** 2025-11-15 **Formula Version:** 300x assist_rate, no throwout_rate, calibrated z-score thresholds