claude-configs/skills/paper-dynasty/workflows/live-series-update.md
Cal Corum 17cac31b7b Retire redundant commit commands, fix paper-dynasty memory refs
- Remove /commit and /commit-push commands (native /commit replaces both)
- Update paper-dynasty to use cognitive-memory MCP instead of archived MemoryGraph
- Fix scouting upload verify path (sba-db -> akamai)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 19:40:54 -06:00

182 lines
7.4 KiB
Markdown

# Live Series Update Workflow
Used during the MLB regular season to generate cards from current-year FanGraphs split data and Baseball Reference fielding/running stats.
## Pre-Flight
Ask the user before starting:
1. **Which cardset?** (e.g., "2025 Season")
2. **How many games played?** (determines season percentage for min PA thresholds)
3. **Which environment?** (prod or dev — check `alt_database` in `db_calls.py`)
All commands run from `/mnt/NV2/Development/paper-dynasty/card-creation/`.
## Data Sourcing
Live series uses **FanGraphs splits** for batting/pitching and **Baseball Reference** for defense/running.
### FanGraphs Data (Manual Download)
FanGraphs split data must be downloaded manually via `scripts/fangraphs_scrape.py` or the FanGraphs web UI. The scraper uses Selenium to export 8 CSV files:
| File | Content |
|------|---------|
| `Batting_vLHP_Standard.csv` | Batting vs LHP — standard stats |
| `Batting_vLHP_BattedBalls.csv` | Batting vs LHP — batted ball profile |
| `Batting_vRHP_Standard.csv` | Batting vs RHP — standard stats |
| `Batting_vRHP_BattedBalls.csv` | Batting vs RHP — batted ball profile |
| `Pitching_vLHH_Standard.csv` | Pitching vs LHH — standard stats |
| `Pitching_vLHH_BattedBalls.csv` | Pitching vs LHH — batted ball profile |
| `Pitching_vRHH_Standard.csv` | Pitching vs RHH — standard stats |
| `Pitching_vRHH_BattedBalls.csv` | Pitching vs RHH — batted ball profile |
These map to the expected input files in `data-input/{cardset} Cardset/`:
- `vlhp-basic.csv` / `vlhp-rate.csv`
- `vrhp-basic.csv` / `vrhp-rate.csv`
- `vlhh-basic.csv` / `vlhh-rate.csv`
- `vrhh-basic.csv` / `vrhh-rate.csv`
**For PotM**: Adjust the `startDate` and `endDate` in the scraper to cover only the target month.
### Baseball Reference Data
Fielding stats are pulled automatically during card generation when `--pull-fielding` is enabled (default). Running and pitching stats come from CSVs in the data-input directory.
---
## Steps
```bash
# 1. Download FanGraphs splits data
# Run the scraper or manually download from FanGraphs splits leaderboard
# Place CSVs in data-input/{cardset} Cardset/
# 2. Verify config (dry-run)
pd-cards live-series update --cardset "<cardset name>" --games <N> --dry-run
# 3. Generate cards (POSTs player data to API)
pd-cards live-series update --cardset "<cardset name>" --games <N>
# 4. Generate images WITHOUT upload (triggers rendering)
pd-cards upload check -c "<cardset name>"
# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
# (see card-generation.md "Bug Prevention" section for validation query)
# 6. Upload to S3 (fast — uses cached images from step 4)
pd-cards upload s3 -c "<cardset name>"
# 7. Generate scouting reports (ALWAYS run for ALL cardsets)
pd-cards scouting all
# 8. Upload scouting CSVs to production server
pd-cards scouting upload
```
**Verify scouting upload**: `ssh akamai "ls -lh container-data/paper-dynasty/storage/ | grep -E 'batting|pitching'"`
---
## Key Differences from Retrosheet Workflow
| Aspect | Live Series | Retrosheet |
|--------|-------------|------------|
| **Data source** | FanGraphs splits + BBRef | Retrosheet play-by-play events |
| **CLI command** | `pd-cards live-series update` | `pd-cards retrosheet process` |
| **Season progress** | `--games N` (1-162) | `--season-pct` + `--start`/`--end` dates |
| **Defense data** | Auto-pulled from BBRef (`--pull-fielding`) | Pre-downloaded defense CSVs |
| **Position validation** | Built-in (skips for promo cardsets) | Separate `pd-cards retrosheet validate` step |
| **Arm ratings** | Not applicable (BBRef has current data) | Generated from Retrosheet events |
| **Recency bias** | Not applicable | `--last-twoweeks-ratio` (auto-enabled after May 30) |
| **Player ID lookup** | FanGraphs/BBRef IDs in CSV | Retrosheet IDs → pybaseball reverse lookup |
---
## Players of the Month (PotM) Variant
During the regular season, PotM cards are generated from the same FanGraphs pipeline but filtered to a single month's stats and posted to a promo cardset.
### Key Differences from Full Update
| Setting | Full Update | PotM |
|---------|------------|------|
| Cardset | Season cardset (e.g., "2025 Season") | Promo cardset (e.g., "2025 Promos") |
| FanGraphs date range | Season start → current date | Month start → month end |
| `--games` | Cumulative games played | Games in that month (~27) |
| `--ignore-limits` | Usually no | Usually yes (short sample) |
| Position updates | Yes | Skipped (cardset name contains "promos") |
### PotM Pre-Flight Checklist
1. **Choose players** — Typically 2 IF, 2 OF, 1 SP, 1 RP per league
2. **Download month-specific FanGraphs data** — Set date range in scraper to the target month only
3. **Confirm promo cardset exists** in the database
4. **Place CSVs** in the promo cardset's data-input directory
### PotM Steps
```bash
# 1. Download FanGraphs splits for the target month only
# Adjust startDate/endDate in fangraphs_scrape.py or manual download
# Place in data-input/{promo cardset} Cardset/
# 2. Dry-run
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
--description "<Month> PotM" --ignore-limits --dry-run
# 3. Generate cards
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
--description "<Month> PotM" --ignore-limits
# 4-6. Image validation and S3 upload (same pattern)
pd-cards upload check -c "<promo cardset name>"
# Run groundball_b validation
pd-cards upload s3 -c "<promo cardset name>"
# 7-8. Scouting reports — ALWAYS regenerate for ALL cardsets
pd-cards scouting all
pd-cards scouting upload
```
### PotM-Specific Notes
- **Position updates are skipped** when the cardset name contains "promos" (both live_series_update.py and the CLI check for this).
- **Description protection** — PotM descriptions (e.g., "April PotM") are never overwritten by subsequent full-cardset runs. The `should_update_player_description()` helper checks for "potm" in the existing description.
- **`--ignore-limits`** is typically needed because a single month may not produce enough PA/TBF to meet normal thresholds (20 vL / 40 vR).
- **Scouting must cover ALL cardsets** — PotM players appear alongside live players. Always run `pd-cards scouting all` without `--cardset-id` to preserve the unified scouting view.
### Example: June 2025 PotM
```bash
# Download June-only FanGraphs splits (June 1 - June 30)
# Place CSVs in data-input/2025 Promos Cardset/
pd-cards live-series update --cardset "2025 Promos" --games 27 \
--description "June PotM" --ignore-limits --dry-run
pd-cards live-series update --cardset "2025 Promos" --games 27 \
--description "June PotM" --ignore-limits
pd-cards upload check -c "2025 Promos"
pd-cards upload s3 -c "2025 Promos"
pd-cards scouting all
pd-cards scouting upload
```
---
## Common Issues
**"No players found"**: Wrong cardset name or database environment. Verify `alt_database` in `db_calls.py`.
**Missing FanGraphs CSVs**: The scraper requires Chrome/Selenium. If it fails, download manually from FanGraphs splits leaderboard with the correct date range and stat group settings.
**High DH count**: Defense pull failed or BBRef was rate-limited. Re-run with `--pull-fielding` or manually download defense CSVs.
**Early-season runs**: Use `--ignore-limits` when games played is low (< ~40) to avoid filtering out most players.
---
**Last Updated**: 2026-02-14
**Version**: 1.0 (Initial workflow documentation)