Cal Corum 17cac31b7b Retire redundant commit commands, fix paper-dynasty memory refs

- Remove /commit and /commit-push commands (native /commit replaces both)
- Update paper-dynasty to use cognitive-memory MCP instead of archived MemoryGraph
- Fix scouting upload verify path (sba-db -> akamai)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-05 19:40:54 -06:00

7.4 KiB

Raw Blame History

Live Series Update Workflow

Used during the MLB regular season to generate cards from current-year FanGraphs split data and Baseball Reference fielding/running stats.

Pre-Flight

Ask the user before starting:

Which cardset? (e.g., "2025 Season")
How many games played? (determines season percentage for min PA thresholds)
Which environment? (prod or dev — check alt_database in db_calls.py)

All commands run from /mnt/NV2/Development/paper-dynasty/card-creation/.

Data Sourcing

Live series uses FanGraphs splits for batting/pitching and Baseball Reference for defense/running.

FanGraphs Data (Manual Download)

FanGraphs split data must be downloaded manually via scripts/fangraphs_scrape.py or the FanGraphs web UI. The scraper uses Selenium to export 8 CSV files:

File	Content
`Batting_vLHP_Standard.csv`	Batting vs LHP — standard stats
`Batting_vLHP_BattedBalls.csv`	Batting vs LHP — batted ball profile
`Batting_vRHP_Standard.csv`	Batting vs RHP — standard stats
`Batting_vRHP_BattedBalls.csv`	Batting vs RHP — batted ball profile
`Pitching_vLHH_Standard.csv`	Pitching vs LHH — standard stats
`Pitching_vLHH_BattedBalls.csv`	Pitching vs LHH — batted ball profile
`Pitching_vRHH_Standard.csv`	Pitching vs RHH — standard stats
`Pitching_vRHH_BattedBalls.csv`	Pitching vs RHH — batted ball profile

These map to the expected input files in data-input/{cardset} Cardset/:

vlhp-basic.csv / vlhp-rate.csv
vrhp-basic.csv / vrhp-rate.csv
vlhh-basic.csv / vlhh-rate.csv
vrhh-basic.csv / vrhh-rate.csv

For PotM: Adjust the startDate and endDate in the scraper to cover only the target month.

Baseball Reference Data

Fielding stats are pulled automatically during card generation when --pull-fielding is enabled (default). Running and pitching stats come from CSVs in the data-input directory.

Steps

# 1. Download FanGraphs splits data
#    Run the scraper or manually download from FanGraphs splits leaderboard
#    Place CSVs in data-input/{cardset} Cardset/

# 2. Verify config (dry-run)
pd-cards live-series update --cardset "<cardset name>" --games <N> --dry-run

# 3. Generate cards (POSTs player data to API)
pd-cards live-series update --cardset "<cardset name>" --games <N>

# 4. Generate images WITHOUT upload (triggers rendering)
pd-cards upload check -c "<cardset name>"

# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
#    (see card-generation.md "Bug Prevention" section for validation query)

# 6. Upload to S3 (fast — uses cached images from step 4)
pd-cards upload s3 -c "<cardset name>"

# 7. Generate scouting reports (ALWAYS run for ALL cardsets)
pd-cards scouting all

# 8. Upload scouting CSVs to production server
pd-cards scouting upload

Verify scouting upload: ssh akamai "ls -lh container-data/paper-dynasty/storage/ | grep -E 'batting|pitching'"

Key Differences from Retrosheet Workflow

Aspect	Live Series	Retrosheet
Data source	FanGraphs splits + BBRef	Retrosheet play-by-play events
CLI command	`pd-cards live-series update`	`pd-cards retrosheet process`
Season progress	`--games N` (1-162)	`--season-pct` + `--start`/`--end` dates
Defense data	Auto-pulled from BBRef (`--pull-fielding`)	Pre-downloaded defense CSVs
Position validation	Built-in (skips for promo cardsets)	Separate `pd-cards retrosheet validate` step
Arm ratings	Not applicable (BBRef has current data)	Generated from Retrosheet events
Recency bias	Not applicable	`--last-twoweeks-ratio` (auto-enabled after May 30)
Player ID lookup	FanGraphs/BBRef IDs in CSV	Retrosheet IDs → pybaseball reverse lookup

Players of the Month (PotM) Variant

During the regular season, PotM cards are generated from the same FanGraphs pipeline but filtered to a single month's stats and posted to a promo cardset.

Key Differences from Full Update

Setting	Full Update	PotM
Cardset	Season cardset (e.g., "2025 Season")	Promo cardset (e.g., "2025 Promos")
FanGraphs date range	Season start → current date	Month start → month end
`--games`	Cumulative games played	Games in that month (~27)
`--ignore-limits`	Usually no	Usually yes (short sample)
Position updates	Yes	Skipped (cardset name contains "promos")

PotM Pre-Flight Checklist

Choose players — Typically 2 IF, 2 OF, 1 SP, 1 RP per league
Download month-specific FanGraphs data — Set date range in scraper to the target month only
Confirm promo cardset exists in the database
Place CSVs in the promo cardset's data-input directory

PotM Steps

# 1. Download FanGraphs splits for the target month only
#    Adjust startDate/endDate in fangraphs_scrape.py or manual download
#    Place in data-input/{promo cardset} Cardset/

# 2. Dry-run
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
  --description "<Month> PotM" --ignore-limits --dry-run

# 3. Generate cards
pd-cards live-series update --cardset "<promo cardset>" --games <month_games> \
  --description "<Month> PotM" --ignore-limits

# 4-6. Image validation and S3 upload (same pattern)
pd-cards upload check -c "<promo cardset name>"
# Run groundball_b validation
pd-cards upload s3 -c "<promo cardset name>"

# 7-8. Scouting reports — ALWAYS regenerate for ALL cardsets
pd-cards scouting all
pd-cards scouting upload

PotM-Specific Notes

Position updates are skipped when the cardset name contains "promos" (both live_series_update.py and the CLI check for this).
Description protection — PotM descriptions (e.g., "April PotM") are never overwritten by subsequent full-cardset runs. The should_update_player_description() helper checks for "potm" in the existing description.
--ignore-limits is typically needed because a single month may not produce enough PA/TBF to meet normal thresholds (20 vL / 40 vR).
Scouting must cover ALL cardsets — PotM players appear alongside live players. Always run pd-cards scouting all without --cardset-id to preserve the unified scouting view.

Example: June 2025 PotM

# Download June-only FanGraphs splits (June 1 - June 30)
# Place CSVs in data-input/2025 Promos Cardset/

pd-cards live-series update --cardset "2025 Promos" --games 27 \
  --description "June PotM" --ignore-limits --dry-run

pd-cards live-series update --cardset "2025 Promos" --games 27 \
  --description "June PotM" --ignore-limits

pd-cards upload check -c "2025 Promos"
pd-cards upload s3 -c "2025 Promos"
pd-cards scouting all
pd-cards scouting upload

Common Issues

"No players found": Wrong cardset name or database environment. Verify alt_database in db_calls.py.

Missing FanGraphs CSVs: The scraper requires Chrome/Selenium. If it fails, download manually from FanGraphs splits leaderboard with the correct date range and stat group settings.

High DH count: Defense pull failed or BBRef was rate-limited. Re-run with --pull-fielding or manually download defense CSVs.

Early-season runs: Use --ignore-limits when games played is low (< ~40) to avoid filtering out most players.

Last Updated: 2026-02-14 Version: 1.0 (Initial workflow documentation)

7.4 KiB Raw Blame History