paper-dynasty-card-creation/docs/HOW_TO_USE_ARM_RATINGS.md
2025-11-23 01:28:33 -06:00

231 lines
6.6 KiB
Markdown

# How to Use Retrosheet Arm Ratings
## Overview
The arm rating system calculates outfield arm strength from Retrosheet play-by-play data. Ratings are saved to CSV files for easy reference in card creation scripts.
## File Format
**Location:** `data-output/retrosheet_arm_ratings_YYYY.csv`
**Columns:**
- `player_id`: Baseball Reference player ID (key_bbref)
- `position`: OF position (LF, CF, or RF)
- `season`: Year
- `balls_fielded`: Total balls fielded at this position
- `total_assists`: Total assists at this position
- `home_throws`: Assists that threw out a runner at home
- `batter_extra_outs`: Assists preventing batter from taking extra base
- `assist_rate`: Assists per ball fielded (0.0-1.0)
- `raw_score`: Raw composite arm score
- `z_score`: Standardized score vs position average
- `arm_rating`: Final rating (-6 to +5)
## Generating Arm Ratings
### For a Complete Season
```bash
python generate_arm_ratings_csv.py \
--year 2005 \
--events data-input/retrosheet/retrosheets_events_2005.csv
```
### For a Partial Season (Live Series)
```bash
python generate_arm_ratings_csv.py \
--year 2025 \
--events data-input/retrosheet/events_2025.csv \
--season-pct 0.5 # 50% of season completed
```
### Custom Output Directory
```bash
python generate_arm_ratings_csv.py \
--year 2005 \
--events data-input/retrosheet/events.csv \
--output-dir custom-output/
```
## Using Arm Ratings in Your Scripts
### Method 1: Load from CSV (Recommended)
```python
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
# Load pre-calculated ratings
arm_ratings = load_arm_ratings_from_csv(season_year=2005)
# Get rating for a specific player
player_rating = arm_ratings.get('suzui001', 0) # Returns 0 if not found
# Or use the helper function
from defenders.retrosheet_arm_calculator import get_arm_for_player
player_rating = get_arm_for_player(arm_ratings, 'suzui001', default=0)
```
### Method 2: Calculate On-The-Fly
```python
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
import pandas as pd
# Load events
df_events = pd.read_csv('data-input/retrosheet/events.csv')
# Calculate ratings
arm_ratings = calculate_of_arms_from_retrosheet(df_events, season_pct=1.0)
# Use ratings
player_rating = arm_ratings.get('suzui001', 0)
```
### Method 3: Load Full DataFrame for Analysis
```python
import pandas as pd
# Load the CSV directly
df_arms = pd.read_csv('data-output/retrosheet_arm_ratings_2005.csv')
# Filter to specific position
rf_arms = df_arms[df_arms['position'] == 'RF']
# Get top 10 arms
elite_arms = df_arms.nsmallest(10, 'arm_rating')
# Lookup specific player
suzuki = df_arms[df_arms['player_id'] == 'suzui001']
```
## Integration with Card Creation
### In retrosheet_data.py
```python
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
# At the top of your script, after determining the season year
SEASON_YEAR = 2005
# Load arm ratings
try:
retrosheet_arm_ratings = load_arm_ratings_from_csv(SEASON_YEAR)
print(f"Loaded arm ratings for {len(retrosheet_arm_ratings)} outfielders")
except FileNotFoundError:
print(f"Warning: No arm ratings found for {SEASON_YEAR}, using defaults")
retrosheet_arm_ratings = {}
# Later, when assigning positions in create_positions()
# Replace the current arm_outfield() call with:
from defenders.retrosheet_arm_calculator import get_arm_for_player
arm_rating = get_arm_for_player(
retrosheet_arm_ratings,
df_data['key_bbref'],
default=0 # Average rating if player not found
)
```
### In defenders/calcs_defense.py
If you want to integrate into the main defensive calculation:
```python
# In create_positions(), around line 84:
if df_data["key_bbref"] in df_of.index and len(of_arms) > 0 and len(of_payloads) > 0:
try:
# Load arm ratings if available
if not hasattr(create_positions, 'arm_ratings_cache'):
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
try:
create_positions.arm_ratings_cache = load_arm_ratings_from_csv(SEASON_YEAR)
except:
create_positions.arm_ratings_cache = {}
# Get arm rating from Retrosheet data
arm_rating = create_positions.arm_ratings_cache.get(df_data['key_bbref'])
# Fall back to original calculation if not found
if arm_rating is None:
arm_rating = arm_outfield(of_arms)
# ... rest of the code
```
## Rating Scale Reference
| Rating | Description | Approx % | Example Players (2005) |
|--------|-------------|----------|----------------------|
| -6 | Elite cannon | ~1% | Best arms in league |
| -5 | Outstanding | ~2% | Gold Glove caliber |
| -4 | Excellent | ~3% | Well above average |
| -3 | Very Good | ~5% | Above average |
| -2 | Above Average | ~10% | Solid arms |
| -1 | Slightly Above | ~15% | Decent arms |
| 0 | Average | ~45% | League average |
| +1 | Slightly Below | ~10% | Below average |
| +2 | Below Average | ~5% | Weak arms |
| +3 | Poor | ~3% | Poor arms |
| +4 | Very Poor | ~2% | Very weak |
| +5 | Very Weak | ~1% | Worst arms |
## Troubleshooting
### File Not Found Error
```python
FileNotFoundError: Arm ratings file not found: data-output/retrosheet_arm_ratings_2005.csv
```
**Solution:** Generate the CSV file first:
```bash
python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events.csv
```
### Player Not in Ratings
If a player doesn't have a rating, it means:
1. They didn't play OF in that season
2. They didn't meet minimum sample size (50 balls fielded)
3. They played OF but only in games not in your Retrosheet data
**Solution:** Use the `default=0` parameter to assign average rating.
### Multiple Positions
If a player played multiple OF positions (e.g., both RF and CF), the CSV will have separate rows. When loading with `load_arm_ratings_from_csv()`, it automatically uses the **best (lowest) rating** across all positions.
## Maintenance
### Regenerating Ratings
If you update the arm rating formula or thresholds, regenerate the CSV files:
```bash
# Regenerate for one year
python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events_2005.csv
# Regenerate for multiple years
for year in 2005 2006 2007; do
python generate_arm_ratings_csv.py --year $year --events data-input/retrosheet/events_$year.csv
done
```
### Version Control
**Recommended:** Commit the CSV files to git so arm ratings are consistent across runs and deployments.
Add to `.gitignore` if you want to regenerate each time:
```
data-output/retrosheet_arm_ratings_*.csv
```
---
**Last Updated:** 2025-11-15
**Formula Version:** 300x assist_rate, no throwout_rate, calibrated z-score thresholds