claude-configs/skills/paper-dynasty/workflows/card-generation.md
Cal Corum 1e9b52186b Update remote refs and card generation workflow
- Remove homelab special-case from commit-push command (all repos now use origin)
- Update sync-config to use origin remote instead of homelab
- Enhance card generation with season-pct params, CLI reference, and validation fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 19:11:19 -06:00

9.2 KiB

Card Generation Workflow

Pre-Flight

Ask the user before starting:

  1. Refresh or new date range? (refresh keeps existing config)
  2. Which environment? (prod or dev)
  3. Which cardset? (e.g., 27 for "2005 Live")
  4. Season progress? (games played or date range for season-pct calculation)

All commands run from /mnt/NV2/Development/paper-dynasty/card-creation/.

Steps

# 1. Verify config (dry-run shows settings without executing)
pd-cards retrosheet process <year> -c <cardset_id> -d <description> \
  --start <YYYYMMDD> --end <YYYYMMDD> --season-pct <0.0-1.0> --dry-run

# 2. Generate cards (POSTs player data to API)
pd-cards retrosheet process <year> -c <cardset_id> -d <description> \
  --start <YYYYMMDD> --end <YYYYMMDD> --season-pct <0.0-1.0>

# 3. Validate positions (DH count MUST be <5; high DH = defense calc failure)
pd-cards retrosheet validate <cardset_id>

# 4. Generate images WITHOUT upload (triggers rendering; groundball_b bug can occur here)
pd-cards upload check -c "<cardset name>"

# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
#    (see "Bug Prevention" section below)

# 6. Upload to S3
pd-cards upload s3 -c "<cardset name>"

# 7. Generate scouting reports (ALWAYS run without --cardset-id to cover all cardsets)
pd-cards scouting all

# 8. Upload scouting CSVs to production server
pd-cards scouting upload

CLI Parameter Reference

Parameter Description Example
--start Season start date (YYYYMMDD) --start 20050403
--end Data cutoff date (YYYYMMDD) --end 20050815
--season-pct Fraction of season completed (0.0-1.0) --season-pct 0.728
--min-pa-vl Min plate appearances vs LHP (default: 20 Live, 1 PotM) --min-pa-vl 20
--min-pa-vr Min plate appearances vs RHP (default: 40 Live, 1 PotM) --min-pa-vr 40
--last-twoweeks-ratio Recency bias weight (auto-enabled at 0.2 after May 30) --last-twoweeks-ratio 0.2
--dry-run / -n Preview without saving to database

Example: 2005 Live Series Update (Mid-August)

pd-cards retrosheet process 2005 -c 27 -d Live --start 20050403 --end 20050815 --season-pct 0.728 --dry-run
pd-cards retrosheet process 2005 -c 27 -d Live --start 20050403 --end 20050815 --season-pct 0.728
pd-cards retrosheet validate 27
pd-cards upload check -c "2005 Live"
# Run groundball_b validation (step 5)
pd-cards upload s3 -c "2005 Live"
pd-cards scouting all
pd-cards scouting upload

Bug Prevention: The Double-Run Pattern

Card image generation (step 4) can create negative groundball_b values that crash game simulation. The prevention strategy:

  1. Step 4: Run upload check (no S3 upload) — triggers image rendering and caches images
  2. Step 5: Query database for negative groundball_b — STOP if any found
  3. Step 6: Run upload s3 — uploads the already-cached (validated) images. Fast because images are cached from step 4.

Never skip step 5. Broken cards uploaded to S3 affect all players immediately.

Step 5 Validation Script

There is no CLI command for this validation yet. Run this Python script via uv run python -c:

uv run python -c "
from db_calls import db_get
import asyncio

async def check_cards():
    result = await db_get('battingcards', params=[('cardset', CARDSET_ID)])
    cards = result.get('cards', [])
    errors = []
    for card in cards:
        player = card.get('player', {})
        pid = player.get('player_id', card.get('id'))
        gb = card.get('groundball_b')
        if gb is not None and gb < 0:
            errors.append(f'Player {pid}: groundball_b = {gb}')
        for field in ['gb_b', 'fb_b', 'ld_b']:
            val = card.get(field)
            if val is not None and (val < 0 or val > 100):
                errors.append(f'Player {pid}: {field} = {val}')
    if errors:
        print('ERRORS FOUND:')
        print('\n'.join(errors))
        print('\nDO NOT PROCEED — fix data and re-run step 2')
    else:
        print(f'Validation passed — {len(cards)} batting cards checked, no issues')

asyncio.run(check_cards())
"

Note: Replace CARDSET_ID with the actual cardset ID (e.g., 27). The API returns {'count': N, 'cards': [...]} — always use result.get('cards', []) to extract the card list.


Architecture

  • retrosheet_data.py processes Retrosheet play-by-play data, calculates ratings, POSTs to API
  • API stores cards in production database; cards are rendered on-demand via URL
  • nginx caches rendered card images by date parameter (?d=YYYY-MM-DD)
  • All operations are idempotent and safe to re-run

Data sources: Retrosheet events CSV, Baseball Reference defense CSVs (data-input/), FanGraphs splits (if needed)

Required input files:

  • data-input/retrosheet/retrosheets_events_*.csv
  • data-input/<cardset name>/defense_*.csv (defense_c.csv, defense_1b.csv, etc.)
  • data-input/<cardset name>/pitching.csv, running.csv

Scouting output: 4 CSVs in scouting/batting-basic.csv, batting-ratings.csv, pitching-basic.csv, pitching-ratings.csv


Common Issues

"No players found" after successful run: Wrong database environment, wrong CARDSET_ID, or DATE mismatch. Check alt_database in db_calls.py. For promos, ensure PROMO_INCLUSION_RETRO_IDS is populated.

High DH count (50+ players): Defense calculation failed. Check defense CSVs exist and column names match (tz_runs_total not tz_runs_outfield). Re-run step 2 after fixing.

S3 upload fails: Check ~/.aws/credentials, verify cards render at API URL manually, re-run (idempotent).

"surplus of X.XX chances" / "Adding X.XX results": Normal rounding adjustments in card generation — informational, not errors.


Players of the Month (PotM) Variant

PotM cards use the same retrosheet pipeline but with a narrower date range, a promo cardset, and a curated player list.

Key Differences from Full Cardset

Setting Full Cardset PotM
--description Live <Month> PotM (e.g., April PotM)
--cardset-id Live cardset (e.g., 27) Promo cardset (e.g., 28)
--start / --end Full season range Single month (e.g., 20050401 - 20050430)
--min-pa-vl / --min-pa-vr 20 / 40 (auto) 1 / 1 (auto when description != "Live")
Player filtering All qualifying players Only PROMO_INCLUSION_RETRO_IDS
Position updates Yes Skipped (promo players keep existing positions)

PotM Pre-Flight Checklist

  1. Choose players — Typically 2 IF, 2 OF, 1 SP, 1 RP per league (AL/NL)
  2. Get Retro IDs — Look up each player's key_retro (e.g., rodra001 for A-Rod)
  3. Determine date range — First and last day of the month in YYYYMMDD format
  4. Confirm promo cardset ID — Usually a separate cardset from the live one

PotM Steps

# 1. Dry-run to verify config
pd-cards retrosheet process <year> -c <promo_cardset_id> \
  -d "<Month> PotM" \
  --start <YYYYMMDD> --end <YYYYMMDD> \
  --dry-run

# 2. Generate promo cards
pd-cards retrosheet process <year> -c <promo_cardset_id> \
  -d "<Month> PotM" \
  --start <YYYYMMDD> --end <YYYYMMDD>

# 3. Validate (expect higher DH count — promo players may lack defense data for short windows)
pd-cards retrosheet validate <promo_cardset_id>

# 4-5. Image validation (same as full cardset — check, validate groundball_b, then upload)
pd-cards upload check -c "<promo cardset name>"
# Run groundball_b validation (step 5 from main workflow)
pd-cards upload s3 -c "<promo cardset name>"

# 6-7. Scouting reports — ALWAYS regenerate for ALL cardsets (no --cardset-id filter)
pd-cards scouting all
pd-cards scouting upload

PotM-Specific Gotchas

  • PROMO_INCLUSION_RETRO_IDS must be populated — If description is not "Live", retrosheet_data.py filters to only these IDs. Empty list = 0 players generated.
  • Don't mix Live and PotM — If PROMO_INCLUSION_RETRO_IDS has entries but description is "Live", the script warns and exits.
  • Description protection — Once a player has a PotM description (e.g., "April PotM"), it is never overwritten by subsequent live series runs. Promo cardset descriptions are also protected: existing cards keep their original month.
  • Scouting must cover ALL cardsets — PotM players appear in scouting alongside live players. Always run pd-cards scouting all without --cardset-id to avoid overwriting the unified scouting data with partial results.

Example: May 2005 PotM

# Players: A-Rod (IF), Delgado (IF), Mench (OF), Abreu (OF), Colon (SP), Ryan (RP), Harang (SP), Hoffman (RP)
# Retro IDs configured in retrosheet_data.py PROMO_INCLUSION_RETRO_IDS

pd-cards retrosheet process 2005 -c 28 -d "May PotM" --start 20050501 --end 20050531 --dry-run
pd-cards retrosheet process 2005 -c 28 -d "May PotM" --start 20050501 --end 20050531
pd-cards retrosheet validate 28
pd-cards upload check -c "2005 Promos"
# Run groundball_b validation
pd-cards upload s3 -c "2005 Promos"
pd-cards scouting all
pd-cards scouting upload

Last Updated: 2026-02-15 Version: 3.2 (Fixed scouting commands to use CLI, fixed groundball_b validation script, added CLI parameter reference and example)