paper-dynasty-card-creation/docs/HOW_TO_USE_ARM_RATINGS.md

# How to Use Retrosheet Arm Ratings

## Overview

The arm rating system calculates outfield arm strength from Retrosheet play-by-play data. Ratings are saved to CSV files for easy reference in card creation scripts.

## File Format

**Location:** `data-output/retrosheet_arm_ratings_YYYY.csv`

**Columns:**
- `player_id`: Baseball Reference player ID (key_bbref)
- `position`: OF position (LF, CF, or RF)
- `season`: Year
- `balls_fielded`: Total balls fielded at this position
- `total_assists`: Total assists at this position
- `home_throws`: Assists that threw out a runner at home
- `batter_extra_outs`: Assists preventing batter from taking extra base
- `assist_rate`: Assists per ball fielded (0.0-1.0)
- `raw_score`: Raw composite arm score
- `z_score`: Standardized score vs position average
- `arm_rating`: Final rating (-6 to +5)

## Generating Arm Ratings

### For a Complete Season

```bash
python generate_arm_ratings_csv.py \
  --year 2005 \
  --events data-input/retrosheet/retrosheets_events_2005.csv
```

### For a Partial Season (Live Series)

```bash
python generate_arm_ratings_csv.py \
  --year 2025 \
  --events data-input/retrosheet/events_2025.csv \
  --season-pct 0.5  # 50% of season completed
```

### Custom Output Directory

```bash
python generate_arm_ratings_csv.py \
  --year 2005 \
  --events data-input/retrosheet/events.csv \
  --output-dir custom-output/
```

## Using Arm Ratings in Your Scripts

### Method 1: Load from CSV (Recommended)

```python
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv

# Load pre-calculated ratings
arm_ratings = load_arm_ratings_from_csv(season_year=2005)

# Get rating for a specific player
player_rating = arm_ratings.get('suzui001', 0)  # Returns 0 if not found

# Or use the helper function
from defenders.retrosheet_arm_calculator import get_arm_for_player
player_rating = get_arm_for_player(arm_ratings, 'suzui001', default=0)
```

### Method 2: Calculate On-The-Fly

```python
from defenders.retrosheet_arm_calculator import calculate_of_arms_from_retrosheet
import pandas as pd

# Load events
df_events = pd.read_csv('data-input/retrosheet/events.csv')

# Calculate ratings
arm_ratings = calculate_of_arms_from_retrosheet(df_events, season_pct=1.0)

# Use ratings
player_rating = arm_ratings.get('suzui001', 0)
```

### Method 3: Load Full DataFrame for Analysis

```python
import pandas as pd

# Load the CSV directly
df_arms = pd.read_csv('data-output/retrosheet_arm_ratings_2005.csv')

# Filter to specific position
rf_arms = df_arms[df_arms['position'] == 'RF']

# Get top 10 arms
elite_arms = df_arms.nsmallest(10, 'arm_rating')

# Lookup specific player
suzuki = df_arms[df_arms['player_id'] == 'suzui001']
```

## Integration with Card Creation

### In retrosheet_data.py

```python
from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv

# At the top of your script, after determining the season year
SEASON_YEAR = 2005

# Load arm ratings
try:
    retrosheet_arm_ratings = load_arm_ratings_from_csv(SEASON_YEAR)
    print(f"Loaded arm ratings for {len(retrosheet_arm_ratings)} outfielders")
except FileNotFoundError:
    print(f"Warning: No arm ratings found for {SEASON_YEAR}, using defaults")
    retrosheet_arm_ratings = {}

# Later, when assigning positions in create_positions()
# Replace the current arm_outfield() call with:
from defenders.retrosheet_arm_calculator import get_arm_for_player

arm_rating = get_arm_for_player(
    retrosheet_arm_ratings,
    df_data['key_bbref'],
    default=0  # Average rating if player not found
)
```

### In defenders/calcs_defense.py

If you want to integrate into the main defensive calculation:

```python
# In create_positions(), around line 84:
if df_data["key_bbref"] in df_of.index and len(of_arms) > 0 and len(of_payloads) > 0:
    try:
        # Load arm ratings if available
        if not hasattr(create_positions, 'arm_ratings_cache'):
            from defenders.retrosheet_arm_calculator import load_arm_ratings_from_csv
            try:
                create_positions.arm_ratings_cache = load_arm_ratings_from_csv(SEASON_YEAR)
            except:
                create_positions.arm_ratings_cache = {}

        # Get arm rating from Retrosheet data
        arm_rating = create_positions.arm_ratings_cache.get(df_data['key_bbref'])

        # Fall back to original calculation if not found
        if arm_rating is None:
            arm_rating = arm_outfield(of_arms)

        # ... rest of the code
```

## Rating Scale Reference

| Rating | Description | Approx % | Example Players (2005) |
|--------|-------------|----------|----------------------|
| -6 | Elite cannon | ~1% | Best arms in league |
| -5 | Outstanding | ~2% | Gold Glove caliber |
| -4 | Excellent | ~3% | Well above average |
| -3 | Very Good | ~5% | Above average |
| -2 | Above Average | ~10% | Solid arms |
| -1 | Slightly Above | ~15% | Decent arms |
| 0 | Average | ~45% | League average |
| +1 | Slightly Below | ~10% | Below average |
| +2 | Below Average | ~5% | Weak arms |
| +3 | Poor | ~3% | Poor arms |
| +4 | Very Poor | ~2% | Very weak |
| +5 | Very Weak | ~1% | Worst arms |

## Troubleshooting

### File Not Found Error

```python
FileNotFoundError: Arm ratings file not found: data-output/retrosheet_arm_ratings_2005.csv
```

**Solution:** Generate the CSV file first:
```bash
python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events.csv
```

### Player Not in Ratings

If a player doesn't have a rating, it means:
1. They didn't play OF in that season
2. They didn't meet minimum sample size (50 balls fielded)
3. They played OF but only in games not in your Retrosheet data

**Solution:** Use the `default=0` parameter to assign average rating.

### Multiple Positions

If a player played multiple OF positions (e.g., both RF and CF), the CSV will have separate rows. When loading with `load_arm_ratings_from_csv()`, it automatically uses the **best (lowest) rating** across all positions.

## Maintenance

### Regenerating Ratings

If you update the arm rating formula or thresholds, regenerate the CSV files:

```bash
# Regenerate for one year
python generate_arm_ratings_csv.py --year 2005 --events data-input/retrosheet/events_2005.csv

# Regenerate for multiple years
for year in 2005 2006 2007; do
  python generate_arm_ratings_csv.py --year $year --events data-input/retrosheet/events_$year.csv
done
```

### Version Control

**Recommended:** Commit the CSV files to git so arm ratings are consistent across runs and deployments.

Add to `.gitignore` if you want to regenerate each time:
```
data-output/retrosheet_arm_ratings_*.csv
```

---

**Last Updated:** 2025-11-15
**Formula Version:** 300x assist_rate, no throwout_rate, calibrated z-score thresholds