231 lines
6.6 KiB
Markdown
231 lines
6.6 KiB
Markdown
# How to Use Retrosheet Arm Ratings
|
|
|
|
## Overview
|
|
|
|
The arm rating system calculates outfield arm strength from Retrosheet play-by-play data. Ratings are saved to CSV files for easy reference in card creation scripts.
|
|
|
|
## File Format
|
|
|
|
**Location:** `data-output/retrosheet_arm_ratings_YYYY.csv`
|
|
|
|
**Columns:**
|
|
- `player_id`: Baseball Reference player ID (key_bbref)
|
|
- `position`: OF position (LF, CF, or RF)
|
|
- `season`: Year
|
|
- `balls_fielded`: Total balls fielded at this position
|
|
- `total_assists`: Total assists at this position
|
|
- `home_throws`: Assists that threw out a runner at home
|
|
- `batter_extra_outs`: Assists preventing batter from taking extra base
|
|
- `assist_rate`: Assists per ball fielded (0.0-1.0)
|
|
- `raw_score`: Raw composite arm score
|
|
- `z_score`: Standardized score vs position average
|
|
- `arm_rating`: Final rating (-6 to +5)
|
|
|
|
## Generating Arm Ratings
|
|
|
|
### For a Complete Season
|
|
|
|
```bash
|
|
python generate_arm_ratings_csv.py \
|
|
--year 2005 \
|
|
--events data-input/retrosheet/retrosheets_events_2005.csv
|
|
```
|
|
|
|
### For a Partial Season (Live Series)
|
|
|
|
```bash
|
|
python generate_arm_ratings_csv.py \
|
|
--year 2025 \
|
|
--events data-input/retrosheet/events_2025.csv \
|
|
--season-pct 0.5 # 50% of season completed
|
|
```
|
|
|
|
### Custom Output Directory
|
|
|
|
```bash
|
|
python generate_arm_ratings_csv.py \
|
|
--year 2005 \
|
|
--events data-input/retrosheet/events.csv \
|
|
--output-dir custom-output/
|
|
```
|
|
|
|
## Using Arm Ratings in Your Scripts
|
|
|
|
### Method 1: Load from CSV (Recommended)
|
|
|
|
```python
|
|
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
|
|
|
|
# Load pre-calculated ratings
|
|
arm_ratings = load_arm_ratings_from_csv(season_year=2005)
|
|
|
|
# Get rating for a specific player
|
|
player_rating = arm_ratings.get('suzui001', 0) # Returns 0 if not found
|
|
|
|
# Or use the helper function
|
|
from defenders.retrosheet_arm_calculator import get_arm_for_player
|
|
player_rating = get_arm_for_player(arm_ratings, 'suzui001', default=0)
|
|
```
|
|
|
|
### Method 2: Calculate On-The-Fly
|
|
|
|
```python
|
|
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
|
|
import pandas as pd
|
|
|
|
# Load events
|
|
df_events = pd.read_csv('data-input/retrosheet/events.csv')
|
|
|
|
# Calculate ratings
|
|
arm_ratings = calculate_of_arms_from_retrosheet(df_events, season_pct=1.0)
|
|
|
|
# Use ratings
|
|
player_rating = arm_ratings.get('suzui001', 0)
|
|
```
|
|
|
|
### Method 3: Load Full DataFrame for Analysis
|
|
|
|
```python
|
|
import pandas as pd
|
|
|
|
# Load the CSV directly
|
|
df_arms = pd.read_csv('data-output/retrosheet_arm_ratings_2005.csv')
|
|
|
|
# Filter to specific position
|
|
rf_arms = df_arms[df_arms['position'] == 'RF']
|
|
|
|
# Get top 10 arms
|
|
elite_arms = df_arms.nsmallest(10, 'arm_rating')
|
|
|
|
# Lookup specific player
|
|
suzuki = df_arms[df_arms['player_id'] == 'suzui001']
|
|
```
|
|
|
|
## Integration with Card Creation
|
|
|
|
### In retrosheet_data.py
|
|
|
|
```python
|
|
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
|
|
|
|
# At the top of your script, after determining the season year
|
|
SEASON_YEAR = 2005
|
|
|
|
# Load arm ratings
|
|
try:
|
|
retrosheet_arm_ratings = load_arm_ratings_from_csv(SEASON_YEAR)
|
|
print(f"Loaded arm ratings for {len(retrosheet_arm_ratings)} outfielders")
|
|
except FileNotFoundError:
|
|
print(f"Warning: No arm ratings found for {SEASON_YEAR}, using defaults")
|
|
retrosheet_arm_ratings = {}
|
|
|
|
# Later, when assigning positions in create_positions()
|
|
# Replace the current arm_outfield() call with:
|
|
from defenders.retrosheet_arm_calculator import get_arm_for_player
|
|
|
|
arm_rating = get_arm_for_player(
|
|
retrosheet_arm_ratings,
|
|
df_data['key_bbref'],
|
|
default=0 # Average rating if player not found
|
|
)
|
|
```
|
|
|
|
### In defenders/calcs_defense.py
|
|
|
|
If you want to integrate into the main defensive calculation:
|
|
|
|
```python
|
|
# In create_positions(), around line 84:
|
|
if df_data["key_bbref"] in df_of.index and len(of_arms) > 0 and len(of_payloads) > 0:
|
|
try:
|
|
# Load arm ratings if available
|
|
if not hasattr(create_positions, 'arm_ratings_cache'):
|
|
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
|
|
try:
|
|
create_positions.arm_ratings_cache = load_arm_ratings_from_csv(SEASON_YEAR)
|
|
except:
|
|
create_positions.arm_ratings_cache = {}
|
|
|
|
# Get arm rating from Retrosheet data
|
|
arm_rating = create_positions.arm_ratings_cache.get(df_data['key_bbref'])
|
|
|
|
# Fall back to original calculation if not found
|
|
if arm_rating is None:
|
|
arm_rating = arm_outfield(of_arms)
|
|
|
|
# ... rest of the code
|
|
```
|
|
|
|
## Rating Scale Reference
|
|
|
|
| Rating | Description | Approx % | Example Players (2005) |
|
|
|--------|-------------|----------|----------------------|
|
|
| -6 | Elite cannon | ~1% | Best arms in league |
|
|
| -5 | Outstanding | ~2% | Gold Glove caliber |
|
|
| -4 | Excellent | ~3% | Well above average |
|
|
| -3 | Very Good | ~5% | Above average |
|
|
| -2 | Above Average | ~10% | Solid arms |
|
|
| -1 | Slightly Above | ~15% | Decent arms |
|
|
| 0 | Average | ~45% | League average |
|
|
| +1 | Slightly Below | ~10% | Below average |
|
|
| +2 | Below Average | ~5% | Weak arms |
|
|
| +3 | Poor | ~3% | Poor arms |
|
|
| +4 | Very Poor | ~2% | Very weak |
|
|
| +5 | Very Weak | ~1% | Worst arms |
|
|
|
|
## Troubleshooting
|
|
|
|
### File Not Found Error
|
|
|
|
```python
|
|
FileNotFoundError: Arm ratings file not found: data-output/retrosheet_arm_ratings_2005.csv
|
|
```
|
|
|
|
**Solution:** Generate the CSV file first:
|
|
```bash
|
|
python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events.csv
|
|
```
|
|
|
|
### Player Not in Ratings
|
|
|
|
If a player doesn't have a rating, it means:
|
|
1. They didn't play OF in that season
|
|
2. They didn't meet minimum sample size (50 balls fielded)
|
|
3. They played OF but only in games not in your Retrosheet data
|
|
|
|
**Solution:** Use the `default=0` parameter to assign average rating.
|
|
|
|
### Multiple Positions
|
|
|
|
If a player played multiple OF positions (e.g., both RF and CF), the CSV will have separate rows. When loading with `load_arm_ratings_from_csv()`, it automatically uses the **best (lowest) rating** across all positions.
|
|
|
|
## Maintenance
|
|
|
|
### Regenerating Ratings
|
|
|
|
If you update the arm rating formula or thresholds, regenerate the CSV files:
|
|
|
|
```bash
|
|
# Regenerate for one year
|
|
python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events_2005.csv
|
|
|
|
# Regenerate for multiple years
|
|
for year in 2005 2006 2007; do
|
|
python generate_arm_ratings_csv.py --year $year --events data-input/retrosheet/events_$year.csv
|
|
done
|
|
```
|
|
|
|
### Version Control
|
|
|
|
**Recommended:** Commit the CSV files to git so arm ratings are consistent across runs and deployments.
|
|
|
|
Add to `.gitignore` if you want to regenerate each time:
|
|
```
|
|
data-output/retrosheet_arm_ratings_*.csv
|
|
```
|
|
|
|
---
|
|
|
|
**Last Updated:** 2025-11-15
|
|
**Formula Version:** 300x assist_rate, no throwout_rate, calibrated z-score thresholds
|