claude-configs/skills/paper-dynasty/workflows/live-series-update.md

# Live Series Update Workflow

Used during the MLB regular season to generate cards from current-year FanGraphs split data and Baseball Reference fielding/running stats.

## Pre-Flight

Ask the user before starting:
1. **Which cardset?** (e.g., "2025 Season")
2. **How many games played?** (determines season percentage for min PA thresholds)
3. **Which environment?** (prod or dev — check `alt_database` in `db_calls.py`)

All commands run from `/mnt/NV2/Development/paper-dynasty/card-creation/`.

## Data Sourcing

Live series uses **FanGraphs splits** for batting/pitching and **Baseball Reference** for defense/running.

### FanGraphs Data (Manual Download)

FanGraphs split data must be downloaded manually via `scripts/fangraphs_scrape.py` or the FanGraphs web UI. The scraper uses Selenium to export 8 CSV files:

| File | Content |
|------|---------|
| `Batting_vLHP_Standard.csv` | Batting vs LHP — standard stats |
| `Batting_vLHP_BattedBalls.csv` | Batting vs LHP — batted ball profile |
| `Batting_vRHP_Standard.csv` | Batting vs RHP — standard stats |
| `Batting_vRHP_BattedBalls.csv` | Batting vs RHP — batted ball profile |
| `Pitching_vLHH_Standard.csv` | Pitching vs LHH — standard stats |
| `Pitching_vLHH_BattedBalls.csv` | Pitching vs LHH — batted ball profile |
| `Pitching_vRHH_Standard.csv` | Pitching vs RHH — standard stats |
| `Pitching_vRHH_BattedBalls.csv` | Pitching vs RHH — batted ball profile |

These map to the expected input files in `data-input/{cardset} Cardset/`:
- `vlhp-basic.csv` / `vlhp-rate.csv`
- `vrhp-basic.csv` / `vrhp-rate.csv`
- `vlhh-basic.csv` / `vlhh-rate.csv`
- `vrhh-basic.csv` / `vrhh-rate.csv`

**For PotM**: Adjust the `startDate` and `endDate` in the scraper to cover only the target month.

### Baseball Reference Data

Fielding stats are pulled automatically during card generation when `--pull-fielding` is enabled (default). Running and pitching stats come from CSVs in the data-input directory.

---

## Steps

```bash
# 1. Download FanGraphs splits data
#    Run the scraper or manually download from FanGraphs splits leaderboard
#    Place CSVs in data-input/{cardset} Cardset/

# 2. Verify config (dry-run)
pd-cards live-series update --cardset "<cardset name>" --games <N> --dry-run

# 3. Generate cards (POSTs player data to API)
pd-cards live-series update --cardset "<cardset name>" --games <N>

# 4. Generate images WITHOUT upload (triggers rendering)
pd-cards upload check -c "<cardset name>"

# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
#    (see card-generation.md "Bug Prevention" section for validation query)

# 6. Upload to S3 (fast — uses cached images from step 4)
pd-cards upload s3 -c "<cardset name>"

# 7. Generate scouting reports (ALWAYS run for ALL cardsets)
pd-cards scouting all

# 8. Upload scouting CSVs to production server
pd-cards scouting upload
```

**Verify scouting upload**: `ssh akamai "ls -lh container-data/paper-dynasty/storage/ | grep -E 'batting|pitching'"`

---

## Key Differences from Retrosheet Workflow

| Aspect | Live Series | Retrosheet |
|--------|-------------|------------|
| **Data source** | FanGraphs splits + BBRef | Retrosheet play-by-play events |
| **CLI command** | `pd-cards live-series update` | `pd-cards retrosheet process` |
| **Season progress** | `--games N` (1-162) | `--season-pct` + `--start`/`--end` dates |
| **Defense data** | Auto-pulled from BBRef (`--pull-fielding`) | Pre-downloaded defense CSVs |
| **Position validation** | Built-in (skips for promo cardsets) | Separate `pd-cards retrosheet validate` step |
| **Arm ratings** | Not applicable (BBRef has current data) | Generated from Retrosheet events |
| **Recency bias** | Not applicable | `--last-twoweeks-ratio` (auto-enabled after May 30) |
| **Player ID lookup** | FanGraphs/BBRef IDs in CSV | Retrosheet IDs → pybaseball reverse lookup |

---

## Players of the Month (PotM) Variant

During the regular season, PotM cards are generated from the same FanGraphs pipeline but filtered to a single month's stats and posted to a promo cardset.

### Key Differences from Full Update

| Setting | Full Update | PotM |
|---------|------------|------|
| Cardset | Season cardset (e.g., "2025 Season") | Promo cardset (e.g., "2025 Promos") |
| FanGraphs date range | Season start → current date | Month start → month end |
| `--games` | Cumulative games played | Games in that month (~27) |
| `--ignore-limits` | Usually no | Usually yes (short sample) |
| Position updates | Yes | Skipped (cardset name contains "promos") |

### PotM Pre-Flight Checklist

1. **Choose players** — Typically 2 IF, 2 OF, 1 SP, 1 RP per league
2. **Download month-specific FanGraphs data** — Set date range in scraper to the target month only
3. **Confirm promo cardset exists** in the database
4. **Place CSVs** in the promo cardset's data-input directory

### PotM Steps

```bash
# 1. Download FanGraphs splits for the target month only
#    Adjust startDate/endDate in fangraphs_scrape.py or manual download
#    Place in data-input/{promo cardset} Cardset/

# 2. Dry-run
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
  --description "<Month> PotM" --ignore-limits --dry-run

# 3. Generate cards
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
  --description "<Month> PotM" --ignore-limits

# 4-6. Image validation and S3 upload (same pattern)
pd-cards upload check -c "<promo cardset name>"
# Run groundball_b validation
pd-cards upload s3 -c "<promo cardset name>"

# 7-8. Scouting reports — ALWAYS regenerate for ALL cardsets
pd-cards scouting all
pd-cards scouting upload
```

### PotM-Specific Notes

- **Position updates are skipped** when the cardset name contains "promos" (both live_series_update.py and the CLI check for this).
- **Description protection** — PotM descriptions (e.g., "April PotM") are never overwritten by subsequent full-cardset runs. The `should_update_player_description()` helper checks for "potm" in the existing description.
- **`--ignore-limits`** is typically needed because a single month may not produce enough PA/TBF to meet normal thresholds (20 vL / 40 vR).
- **Scouting must cover ALL cardsets** — PotM players appear alongside live players. Always run `pd-cards scouting all` without `--cardset-id` to preserve the unified scouting view.

### Example: June 2025 PotM

```bash
# Download June-only FanGraphs splits (June 1 - June 30)
# Place CSVs in data-input/2025 Promos Cardset/

pd-cards live-series update --cardset "2025 Promos" --games 27 \
  --description "June PotM" --ignore-limits --dry-run

pd-cards live-series update --cardset "2025 Promos" --games 27 \
  --description "June PotM" --ignore-limits

pd-cards upload check -c "2025 Promos"
pd-cards upload s3 -c "2025 Promos"
pd-cards scouting all
pd-cards scouting upload
```

---

## Common Issues

**"No players found"**: Wrong cardset name or database environment. Verify `alt_database` in `db_calls.py`.

**Missing FanGraphs CSVs**: The scraper requires Chrome/Selenium. If it fails, download manually from FanGraphs splits leaderboard with the correct date range and stat group settings.

**High DH count**: Defense pull failed or BBRef was rate-limited. Re-run with `--pull-fielding` or manually download defense CSVs.

**Early-season runs**: Use `--ignore-limits` when games played is low (< ~40) to avoid filtering out most players.

---

**Last Updated**: 2026-02-14
**Version**: 1.0 (Initial workflow documentation)