Cal Corum 1e9b52186b Update remote refs and card generation workflow

- Remove homelab special-case from commit-push command (all repos now use origin)
- Update sync-config to use origin remote instead of homelab
- Enhance card generation with season-pct params, CLI reference, and validation fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-15 19:11:19 -06:00

9.2 KiB

Raw Blame History

Card Generation Workflow

Pre-Flight

Ask the user before starting:

Refresh or new date range? (refresh keeps existing config)
Which environment? (prod or dev)
Which cardset? (e.g., 27 for "2005 Live")
Season progress? (games played or date range for season-pct calculation)

All commands run from /mnt/NV2/Development/paper-dynasty/card-creation/.

Steps

# 1. Verify config (dry-run shows settings without executing)
pd-cards retrosheet process <year> -c <cardset_id> -d <description> \
  --start <YYYYMMDD> --end <YYYYMMDD> --season-pct <0.0-1.0> --dry-run

# 2. Generate cards (POSTs player data to API)
pd-cards retrosheet process <year> -c <cardset_id> -d <description> \
  --start <YYYYMMDD> --end <YYYYMMDD> --season-pct <0.0-1.0>

# 3. Validate positions (DH count MUST be <5; high DH = defense calc failure)
pd-cards retrosheet validate <cardset_id>

# 4. Generate images WITHOUT upload (triggers rendering; groundball_b bug can occur here)
pd-cards upload check -c "<cardset name>"

# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
#    (see "Bug Prevention" section below)

# 6. Upload to S3
pd-cards upload s3 -c "<cardset name>"

# 7. Generate scouting reports (ALWAYS run without --cardset-id to cover all cardsets)
pd-cards scouting all

# 8. Upload scouting CSVs to production server
pd-cards scouting upload

CLI Parameter Reference

Parameter	Description	Example
`--start`	Season start date (YYYYMMDD)	`--start 20050403`
`--end`	Data cutoff date (YYYYMMDD)	`--end 20050815`
`--season-pct`	Fraction of season completed (0.0-1.0)	`--season-pct 0.728`
`--min-pa-vl`	Min plate appearances vs LHP (default: 20 Live, 1 PotM)	`--min-pa-vl 20`
`--min-pa-vr`	Min plate appearances vs RHP (default: 40 Live, 1 PotM)	`--min-pa-vr 40`
`--last-twoweeks-ratio`	Recency bias weight (auto-enabled at 0.2 after May 30)	`--last-twoweeks-ratio 0.2`
`--dry-run` / `-n`	Preview without saving to database

Example: 2005 Live Series Update (Mid-August)

pd-cards retrosheet process 2005 -c 27 -d Live --start 20050403 --end 20050815 --season-pct 0.728 --dry-run
pd-cards retrosheet process 2005 -c 27 -d Live --start 20050403 --end 20050815 --season-pct 0.728
pd-cards retrosheet validate 27
pd-cards upload check -c "2005 Live"
# Run groundball_b validation (step 5)
pd-cards upload s3 -c "2005 Live"
pd-cards scouting all
pd-cards scouting upload

Bug Prevention: The Double-Run Pattern

Card image generation (step 4) can create negative groundball_b values that crash game simulation. The prevention strategy:

Step 4: Run upload check (no S3 upload) — triggers image rendering and caches images
Step 5: Query database for negative groundball_b — STOP if any found
Step 6: Run upload s3 — uploads the already-cached (validated) images. Fast because images are cached from step 4.

Never skip step 5. Broken cards uploaded to S3 affect all players immediately.

Step 5 Validation Script

There is no CLI command for this validation yet. Run this Python script via uv run python -c:

uv run python -c "
from db_calls import db_get
import asyncio

async def check_cards():
    result = await db_get('battingcards', params=[('cardset', CARDSET_ID)])
    cards = result.get('cards', [])
    errors = []
    for card in cards:
        player = card.get('player', {})
        pid = player.get('player_id', card.get('id'))
        gb = card.get('groundball_b')
        if gb is not None and gb < 0:
            errors.append(f'Player {pid}: groundball_b = {gb}')
        for field in ['gb_b', 'fb_b', 'ld_b']:
            val = card.get(field)
            if val is not None and (val < 0 or val > 100):
                errors.append(f'Player {pid}: {field} = {val}')
    if errors:
        print('ERRORS FOUND:')
        print('\n'.join(errors))
        print('\nDO NOT PROCEED — fix data and re-run step 2')
    else:
        print(f'Validation passed — {len(cards)} batting cards checked, no issues')

asyncio.run(check_cards())
"

Note: Replace CARDSET_ID with the actual cardset ID (e.g., 27). The API returns {'count': N, 'cards': [...]} — always use result.get('cards', []) to extract the card list.

Architecture

retrosheet_data.py processes Retrosheet play-by-play data, calculates ratings, POSTs to API
API stores cards in production database; cards are rendered on-demand via URL
nginx caches rendered card images by date parameter (?d=YYYY-MM-DD)
All operations are idempotent and safe to re-run

Data sources: Retrosheet events CSV, Baseball Reference defense CSVs (data-input/), FanGraphs splits (if needed)

Required input files:

data-input/retrosheet/retrosheets_events_*.csv
data-input/<cardset name>/defense_*.csv (defense_c.csv, defense_1b.csv, etc.)
data-input/<cardset name>/pitching.csv, running.csv

Scouting output: 4 CSVs in scouting/ — batting-basic.csv, batting-ratings.csv, pitching-basic.csv, pitching-ratings.csv

Common Issues

"No players found" after successful run: Wrong database environment, wrong CARDSET_ID, or DATE mismatch. Check alt_database in db_calls.py. For promos, ensure PROMO_INCLUSION_RETRO_IDS is populated.

High DH count (50+ players): Defense calculation failed. Check defense CSVs exist and column names match (tz_runs_total not tz_runs_outfield). Re-run step 2 after fixing.

S3 upload fails: Check ~/.aws/credentials, verify cards render at API URL manually, re-run (idempotent).

"surplus of X.XX chances" / "Adding X.XX results": Normal rounding adjustments in card generation — informational, not errors.

Players of the Month (PotM) Variant

PotM cards use the same retrosheet pipeline but with a narrower date range, a promo cardset, and a curated player list.

Key Differences from Full Cardset

Setting	Full Cardset	PotM
`--description`	`Live`	`<Month> PotM` (e.g., `April PotM`)
`--cardset-id`	Live cardset (e.g., 27)	Promo cardset (e.g., 28)
`--start` / `--end`	Full season range	Single month (e.g., `20050401` - `20050430`)
`--min-pa-vl` / `--min-pa-vr`	20 / 40 (auto)	1 / 1 (auto when description != "Live")
Player filtering	All qualifying players	Only `PROMO_INCLUSION_RETRO_IDS`
Position updates	Yes	Skipped (promo players keep existing positions)

PotM Pre-Flight Checklist

Choose players — Typically 2 IF, 2 OF, 1 SP, 1 RP per league (AL/NL)
Get Retro IDs — Look up each player's key_retro (e.g., rodra001 for A-Rod)
Determine date range — First and last day of the month in YYYYMMDD format
Confirm promo cardset ID — Usually a separate cardset from the live one

PotM Steps

# 1. Dry-run to verify config
pd-cards retrosheet process <year> -c <promo_cardset_id> \
  -d "<Month> PotM" \
  --start <YYYYMMDD> --end <YYYYMMDD> \
  --dry-run

# 2. Generate promo cards
pd-cards retrosheet process <year> -c <promo_cardset_id> \
  -d "<Month> PotM" \
  --start <YYYYMMDD> --end <YYYYMMDD>

# 3. Validate (expect higher DH count — promo players may lack defense data for short windows)
pd-cards retrosheet validate <promo_cardset_id>

# 4-5. Image validation (same as full cardset — check, validate groundball_b, then upload)
pd-cards upload check -c "<promo cardset name>"
# Run groundball_b validation (step 5 from main workflow)
pd-cards upload s3 -c "<promo cardset name>"

# 6-7. Scouting reports — ALWAYS regenerate for ALL cardsets (no --cardset-id filter)
pd-cards scouting all
pd-cards scouting upload

PotM-Specific Gotchas

PROMO_INCLUSION_RETRO_IDS must be populated — If description is not "Live", retrosheet_data.py filters to only these IDs. Empty list = 0 players generated.
Don't mix Live and PotM — If PROMO_INCLUSION_RETRO_IDS has entries but description is "Live", the script warns and exits.
Description protection — Once a player has a PotM description (e.g., "April PotM"), it is never overwritten by subsequent live series runs. Promo cardset descriptions are also protected: existing cards keep their original month.
Scouting must cover ALL cardsets — PotM players appear in scouting alongside live players. Always run pd-cards scouting all without --cardset-id to avoid overwriting the unified scouting data with partial results.

Example: May 2005 PotM

# Players: A-Rod (IF), Delgado (IF), Mench (OF), Abreu (OF), Colon (SP), Ryan (RP), Harang (SP), Hoffman (RP)
# Retro IDs configured in retrosheet_data.py PROMO_INCLUSION_RETRO_IDS

pd-cards retrosheet process 2005 -c 28 -d "May PotM" --start 20050501 --end 20050531 --dry-run
pd-cards retrosheet process 2005 -c 28 -d "May PotM" --start 20050501 --end 20050531
pd-cards retrosheet validate 28
pd-cards upload check -c "2005 Promos"
# Run groundball_b validation
pd-cards upload s3 -c "2005 Promos"
pd-cards scouting all
pd-cards scouting upload

Last Updated: 2026-02-15 Version: 3.2 (Fixed scouting commands to use CLI, fixed groundball_b validation script, added CLI parameter reference and example)

9.2 KiB Raw Blame History