claude-configs/skills/paper-dynasty/workflows/live-series-update.md
Cal Corum 17cac31b7b Retire redundant commit commands, fix paper-dynasty memory refs
- Remove /commit and /commit-push commands (native /commit replaces both)
- Update paper-dynasty to use cognitive-memory MCP instead of archived MemoryGraph
- Fix scouting upload verify path (sba-db -> akamai)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 19:40:54 -06:00

7.4 KiB

Live Series Update Workflow

Used during the MLB regular season to generate cards from current-year FanGraphs split data and Baseball Reference fielding/running stats.

Pre-Flight

Ask the user before starting:

  1. Which cardset? (e.g., "2025 Season")
  2. How many games played? (determines season percentage for min PA thresholds)
  3. Which environment? (prod or dev — check alt_database in db_calls.py)

All commands run from /mnt/NV2/Development/paper-dynasty/card-creation/.

Data Sourcing

Live series uses FanGraphs splits for batting/pitching and Baseball Reference for defense/running.

FanGraphs Data (Manual Download)

FanGraphs split data must be downloaded manually via scripts/fangraphs_scrape.py or the FanGraphs web UI. The scraper uses Selenium to export 8 CSV files:

File Content
Batting_vLHP_Standard.csv Batting vs LHP — standard stats
Batting_vLHP_BattedBalls.csv Batting vs LHP — batted ball profile
Batting_vRHP_Standard.csv Batting vs RHP — standard stats
Batting_vRHP_BattedBalls.csv Batting vs RHP — batted ball profile
Pitching_vLHH_Standard.csv Pitching vs LHH — standard stats
Pitching_vLHH_BattedBalls.csv Pitching vs LHH — batted ball profile
Pitching_vRHH_Standard.csv Pitching vs RHH — standard stats
Pitching_vRHH_BattedBalls.csv Pitching vs RHH — batted ball profile

These map to the expected input files in data-input/{cardset} Cardset/:

  • vlhp-basic.csv / vlhp-rate.csv
  • vrhp-basic.csv / vrhp-rate.csv
  • vlhh-basic.csv / vlhh-rate.csv
  • vrhh-basic.csv / vrhh-rate.csv

For PotM: Adjust the startDate and endDate in the scraper to cover only the target month.

Baseball Reference Data

Fielding stats are pulled automatically during card generation when --pull-fielding is enabled (default). Running and pitching stats come from CSVs in the data-input directory.


Steps

# 1. Download FanGraphs splits data
#    Run the scraper or manually download from FanGraphs splits leaderboard
#    Place CSVs in data-input/{cardset} Cardset/

# 2. Verify config (dry-run)
pd-cards live-series update --cardset "<cardset name>" --games <N> --dry-run

# 3. Generate cards (POSTs player data to API)
pd-cards live-series update --cardset "<cardset name>" --games <N>

# 4. Generate images WITHOUT upload (triggers rendering)
pd-cards upload check -c "<cardset name>"

# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
#    (see card-generation.md "Bug Prevention" section for validation query)

# 6. Upload to S3 (fast — uses cached images from step 4)
pd-cards upload s3 -c "<cardset name>"

# 7. Generate scouting reports (ALWAYS run for ALL cardsets)
pd-cards scouting all

# 8. Upload scouting CSVs to production server
pd-cards scouting upload

Verify scouting upload: ssh akamai "ls -lh container-data/paper-dynasty/storage/ | grep -E 'batting|pitching'"


Key Differences from Retrosheet Workflow

Aspect Live Series Retrosheet
Data source FanGraphs splits + BBRef Retrosheet play-by-play events
CLI command pd-cards live-series update pd-cards retrosheet process
Season progress --games N (1-162) --season-pct + --start/--end dates
Defense data Auto-pulled from BBRef (--pull-fielding) Pre-downloaded defense CSVs
Position validation Built-in (skips for promo cardsets) Separate pd-cards retrosheet validate step
Arm ratings Not applicable (BBRef has current data) Generated from Retrosheet events
Recency bias Not applicable --last-twoweeks-ratio (auto-enabled after May 30)
Player ID lookup FanGraphs/BBRef IDs in CSV Retrosheet IDs → pybaseball reverse lookup

Players of the Month (PotM) Variant

During the regular season, PotM cards are generated from the same FanGraphs pipeline but filtered to a single month's stats and posted to a promo cardset.

Key Differences from Full Update

Setting Full Update PotM
Cardset Season cardset (e.g., "2025 Season") Promo cardset (e.g., "2025 Promos")
FanGraphs date range Season start → current date Month start → month end
--games Cumulative games played Games in that month (~27)
--ignore-limits Usually no Usually yes (short sample)
Position updates Yes Skipped (cardset name contains "promos")

PotM Pre-Flight Checklist

  1. Choose players — Typically 2 IF, 2 OF, 1 SP, 1 RP per league
  2. Download month-specific FanGraphs data — Set date range in scraper to the target month only
  3. Confirm promo cardset exists in the database
  4. Place CSVs in the promo cardset's data-input directory

PotM Steps

# 1. Download FanGraphs splits for the target month only
#    Adjust startDate/endDate in fangraphs_scrape.py or manual download
#    Place in data-input/{promo cardset} Cardset/

# 2. Dry-run
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
  --description "<Month> PotM" --ignore-limits --dry-run

# 3. Generate cards
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
  --description "<Month> PotM" --ignore-limits

# 4-6. Image validation and S3 upload (same pattern)
pd-cards upload check -c "<promo cardset name>"
# Run groundball_b validation
pd-cards upload s3 -c "<promo cardset name>"

# 7-8. Scouting reports — ALWAYS regenerate for ALL cardsets
pd-cards scouting all
pd-cards scouting upload

PotM-Specific Notes

  • Position updates are skipped when the cardset name contains "promos" (both live_series_update.py and the CLI check for this).
  • Description protection — PotM descriptions (e.g., "April PotM") are never overwritten by subsequent full-cardset runs. The should_update_player_description() helper checks for "potm" in the existing description.
  • --ignore-limits is typically needed because a single month may not produce enough PA/TBF to meet normal thresholds (20 vL / 40 vR).
  • Scouting must cover ALL cardsets — PotM players appear alongside live players. Always run pd-cards scouting all without --cardset-id to preserve the unified scouting view.

Example: June 2025 PotM

# Download June-only FanGraphs splits (June 1 - June 30)
# Place CSVs in data-input/2025 Promos Cardset/

pd-cards live-series update --cardset "2025 Promos" --games 27 \
  --description "June PotM" --ignore-limits --dry-run

pd-cards live-series update --cardset "2025 Promos" --games 27 \
  --description "June PotM" --ignore-limits

pd-cards upload check -c "2025 Promos"
pd-cards upload s3 -c "2025 Promos"
pd-cards scouting all
pd-cards scouting upload

Common Issues

"No players found": Wrong cardset name or database environment. Verify alt_database in db_calls.py.

Missing FanGraphs CSVs: The scraper requires Chrome/Selenium. If it fails, download manually from FanGraphs splits leaderboard with the correct date range and stat group settings.

High DH count: Defense pull failed or BBRef was rate-limited. Re-run with --pull-fielding or manually download defense CSVs.

Early-season runs: Use --ignore-limits when games played is low (< ~40) to avoid filtering out most players.


Last Updated: 2026-02-14 Version: 1.0 (Initial workflow documentation)