- Remove /commit and /commit-push commands (native /commit replaces both) - Update paper-dynasty to use cognitive-memory MCP instead of archived MemoryGraph - Fix scouting upload verify path (sba-db -> akamai) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
182 lines
7.4 KiB
Markdown
182 lines
7.4 KiB
Markdown
# Live Series Update Workflow
|
|
|
|
Used during the MLB regular season to generate cards from current-year FanGraphs split data and Baseball Reference fielding/running stats.
|
|
|
|
## Pre-Flight
|
|
|
|
Ask the user before starting:
|
|
1. **Which cardset?** (e.g., "2025 Season")
|
|
2. **How many games played?** (determines season percentage for min PA thresholds)
|
|
3. **Which environment?** (prod or dev — check `alt_database` in `db_calls.py`)
|
|
|
|
All commands run from `/mnt/NV2/Development/paper-dynasty/card-creation/`.
|
|
|
|
## Data Sourcing
|
|
|
|
Live series uses **FanGraphs splits** for batting/pitching and **Baseball Reference** for defense/running.
|
|
|
|
### FanGraphs Data (Manual Download)
|
|
|
|
FanGraphs split data must be downloaded manually via `scripts/fangraphs_scrape.py` or the FanGraphs web UI. The scraper uses Selenium to export 8 CSV files:
|
|
|
|
| File | Content |
|
|
|------|---------|
|
|
| `Batting_vLHP_Standard.csv` | Batting vs LHP — standard stats |
|
|
| `Batting_vLHP_BattedBalls.csv` | Batting vs LHP — batted ball profile |
|
|
| `Batting_vRHP_Standard.csv` | Batting vs RHP — standard stats |
|
|
| `Batting_vRHP_BattedBalls.csv` | Batting vs RHP — batted ball profile |
|
|
| `Pitching_vLHH_Standard.csv` | Pitching vs LHH — standard stats |
|
|
| `Pitching_vLHH_BattedBalls.csv` | Pitching vs LHH — batted ball profile |
|
|
| `Pitching_vRHH_Standard.csv` | Pitching vs RHH — standard stats |
|
|
| `Pitching_vRHH_BattedBalls.csv` | Pitching vs RHH — batted ball profile |
|
|
|
|
These map to the expected input files in `data-input/{cardset} Cardset/`:
|
|
- `vlhp-basic.csv` / `vlhp-rate.csv`
|
|
- `vrhp-basic.csv` / `vrhp-rate.csv`
|
|
- `vlhh-basic.csv` / `vlhh-rate.csv`
|
|
- `vrhh-basic.csv` / `vrhh-rate.csv`
|
|
|
|
**For PotM**: Adjust the `startDate` and `endDate` in the scraper to cover only the target month.
|
|
|
|
### Baseball Reference Data
|
|
|
|
Fielding stats are pulled automatically during card generation when `--pull-fielding` is enabled (default). Running and pitching stats come from CSVs in the data-input directory.
|
|
|
|
---
|
|
|
|
## Steps
|
|
|
|
```bash
|
|
# 1. Download FanGraphs splits data
|
|
# Run the scraper or manually download from FanGraphs splits leaderboard
|
|
# Place CSVs in data-input/{cardset} Cardset/
|
|
|
|
# 2. Verify config (dry-run)
|
|
pd-cards live-series update --cardset "<cardset name>" --games <N> --dry-run
|
|
|
|
# 3. Generate cards (POSTs player data to API)
|
|
pd-cards live-series update --cardset "<cardset name>" --games <N>
|
|
|
|
# 4. Generate images WITHOUT upload (triggers rendering)
|
|
pd-cards upload check -c "<cardset name>"
|
|
|
|
# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
|
|
# (see card-generation.md "Bug Prevention" section for validation query)
|
|
|
|
# 6. Upload to S3 (fast — uses cached images from step 4)
|
|
pd-cards upload s3 -c "<cardset name>"
|
|
|
|
# 7. Generate scouting reports (ALWAYS run for ALL cardsets)
|
|
pd-cards scouting all
|
|
|
|
# 8. Upload scouting CSVs to production server
|
|
pd-cards scouting upload
|
|
```
|
|
|
|
**Verify scouting upload**: `ssh akamai "ls -lh container-data/paper-dynasty/storage/ | grep -E 'batting|pitching'"`
|
|
|
|
---
|
|
|
|
## Key Differences from Retrosheet Workflow
|
|
|
|
| Aspect | Live Series | Retrosheet |
|
|
|--------|-------------|------------|
|
|
| **Data source** | FanGraphs splits + BBRef | Retrosheet play-by-play events |
|
|
| **CLI command** | `pd-cards live-series update` | `pd-cards retrosheet process` |
|
|
| **Season progress** | `--games N` (1-162) | `--season-pct` + `--start`/`--end` dates |
|
|
| **Defense data** | Auto-pulled from BBRef (`--pull-fielding`) | Pre-downloaded defense CSVs |
|
|
| **Position validation** | Built-in (skips for promo cardsets) | Separate `pd-cards retrosheet validate` step |
|
|
| **Arm ratings** | Not applicable (BBRef has current data) | Generated from Retrosheet events |
|
|
| **Recency bias** | Not applicable | `--last-twoweeks-ratio` (auto-enabled after May 30) |
|
|
| **Player ID lookup** | FanGraphs/BBRef IDs in CSV | Retrosheet IDs → pybaseball reverse lookup |
|
|
|
|
---
|
|
|
|
## Players of the Month (PotM) Variant
|
|
|
|
During the regular season, PotM cards are generated from the same FanGraphs pipeline but filtered to a single month's stats and posted to a promo cardset.
|
|
|
|
### Key Differences from Full Update
|
|
|
|
| Setting | Full Update | PotM |
|
|
|---------|------------|------|
|
|
| Cardset | Season cardset (e.g., "2025 Season") | Promo cardset (e.g., "2025 Promos") |
|
|
| FanGraphs date range | Season start → current date | Month start → month end |
|
|
| `--games` | Cumulative games played | Games in that month (~27) |
|
|
| `--ignore-limits` | Usually no | Usually yes (short sample) |
|
|
| Position updates | Yes | Skipped (cardset name contains "promos") |
|
|
|
|
### PotM Pre-Flight Checklist
|
|
|
|
1. **Choose players** — Typically 2 IF, 2 OF, 1 SP, 1 RP per league
|
|
2. **Download month-specific FanGraphs data** — Set date range in scraper to the target month only
|
|
3. **Confirm promo cardset exists** in the database
|
|
4. **Place CSVs** in the promo cardset's data-input directory
|
|
|
|
### PotM Steps
|
|
|
|
```bash
|
|
# 1. Download FanGraphs splits for the target month only
|
|
# Adjust startDate/endDate in fangraphs_scrape.py or manual download
|
|
# Place in data-input/{promo cardset} Cardset/
|
|
|
|
# 2. Dry-run
|
|
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
|
|
--description "<Month> PotM" --ignore-limits --dry-run
|
|
|
|
# 3. Generate cards
|
|
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
|
|
--description "<Month> PotM" --ignore-limits
|
|
|
|
# 4-6. Image validation and S3 upload (same pattern)
|
|
pd-cards upload check -c "<promo cardset name>"
|
|
# Run groundball_b validation
|
|
pd-cards upload s3 -c "<promo cardset name>"
|
|
|
|
# 7-8. Scouting reports — ALWAYS regenerate for ALL cardsets
|
|
pd-cards scouting all
|
|
pd-cards scouting upload
|
|
```
|
|
|
|
### PotM-Specific Notes
|
|
|
|
- **Position updates are skipped** when the cardset name contains "promos" (both live_series_update.py and the CLI check for this).
|
|
- **Description protection** — PotM descriptions (e.g., "April PotM") are never overwritten by subsequent full-cardset runs. The `should_update_player_description()` helper checks for "potm" in the existing description.
|
|
- **`--ignore-limits`** is typically needed because a single month may not produce enough PA/TBF to meet normal thresholds (20 vL / 40 vR).
|
|
- **Scouting must cover ALL cardsets** — PotM players appear alongside live players. Always run `pd-cards scouting all` without `--cardset-id` to preserve the unified scouting view.
|
|
|
|
### Example: June 2025 PotM
|
|
|
|
```bash
|
|
# Download June-only FanGraphs splits (June 1 - June 30)
|
|
# Place CSVs in data-input/2025 Promos Cardset/
|
|
|
|
pd-cards live-series update --cardset "2025 Promos" --games 27 \
|
|
--description "June PotM" --ignore-limits --dry-run
|
|
|
|
pd-cards live-series update --cardset "2025 Promos" --games 27 \
|
|
--description "June PotM" --ignore-limits
|
|
|
|
pd-cards upload check -c "2025 Promos"
|
|
pd-cards upload s3 -c "2025 Promos"
|
|
pd-cards scouting all
|
|
pd-cards scouting upload
|
|
```
|
|
|
|
---
|
|
|
|
## Common Issues
|
|
|
|
**"No players found"**: Wrong cardset name or database environment. Verify `alt_database` in `db_calls.py`.
|
|
|
|
**Missing FanGraphs CSVs**: The scraper requires Chrome/Selenium. If it fails, download manually from FanGraphs splits leaderboard with the correct date range and stat group settings.
|
|
|
|
**High DH count**: Defense pull failed or BBRef was rate-limited. Re-run with `--pull-fielding` or manually download defense CSVs.
|
|
|
|
**Early-season runs**: Use `--ignore-limits` when games played is low (< ~40) to avoid filtering out most players.
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-02-14
|
|
**Version**: 1.0 (Initial workflow documentation)
|