claude-configs/skills/paper-dynasty/workflows/card-generation.md
Cal Corum 1e9b52186b Update remote refs and card generation workflow
- Remove homelab special-case from commit-push command (all repos now use origin)
- Update sync-config to use origin remote instead of homelab
- Enhance card generation with season-pct params, CLI reference, and validation fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 19:11:19 -06:00

224 lines
9.2 KiB
Markdown

# Card Generation Workflow
## Pre-Flight
Ask the user before starting:
1. **Refresh or new date range?** (refresh keeps existing config)
2. **Which environment?** (prod or dev)
3. **Which cardset?** (e.g., 27 for "2005 Live")
4. **Season progress?** (games played or date range for season-pct calculation)
All commands run from `/mnt/NV2/Development/paper-dynasty/card-creation/`.
## Steps
```bash
# 1. Verify config (dry-run shows settings without executing)
pd-cards retrosheet process <year> -c <cardset_id> -d <description> \
--start <YYYYMMDD> --end <YYYYMMDD> --season-pct <0.0-1.0> --dry-run
# 2. Generate cards (POSTs player data to API)
pd-cards retrosheet process <year> -c <cardset_id> -d <description> \
--start <YYYYMMDD> --end <YYYYMMDD> --season-pct <0.0-1.0>
# 3. Validate positions (DH count MUST be <5; high DH = defense calc failure)
pd-cards retrosheet validate <cardset_id>
# 4. Generate images WITHOUT upload (triggers rendering; groundball_b bug can occur here)
pd-cards upload check -c "<cardset name>"
# 5. CRITICAL: Validate database for negative groundball_b — STOP if errors found
# (see "Bug Prevention" section below)
# 6. Upload to S3
pd-cards upload s3 -c "<cardset name>"
# 7. Generate scouting reports (ALWAYS run without --cardset-id to cover all cardsets)
pd-cards scouting all
# 8. Upload scouting CSVs to production server
pd-cards scouting upload
```
### CLI Parameter Reference
| Parameter | Description | Example |
|-----------|-------------|---------|
| `--start` | Season start date (YYYYMMDD) | `--start 20050403` |
| `--end` | Data cutoff date (YYYYMMDD) | `--end 20050815` |
| `--season-pct` | Fraction of season completed (0.0-1.0) | `--season-pct 0.728` |
| `--min-pa-vl` | Min plate appearances vs LHP (default: 20 Live, 1 PotM) | `--min-pa-vl 20` |
| `--min-pa-vr` | Min plate appearances vs RHP (default: 40 Live, 1 PotM) | `--min-pa-vr 40` |
| `--last-twoweeks-ratio` | Recency bias weight (auto-enabled at 0.2 after May 30) | `--last-twoweeks-ratio 0.2` |
| `--dry-run` / `-n` | Preview without saving to database | |
### Example: 2005 Live Series Update (Mid-August)
```bash
pd-cards retrosheet process 2005 -c 27 -d Live --start 20050403 --end 20050815 --season-pct 0.728 --dry-run
pd-cards retrosheet process 2005 -c 27 -d Live --start 20050403 --end 20050815 --season-pct 0.728
pd-cards retrosheet validate 27
pd-cards upload check -c "2005 Live"
# Run groundball_b validation (step 5)
pd-cards upload s3 -c "2005 Live"
pd-cards scouting all
pd-cards scouting upload
```
---
## Bug Prevention: The Double-Run Pattern
Card image generation (step 4) can create **negative groundball_b values** that crash game simulation. The prevention strategy:
1. **Step 4**: Run `upload check` (no S3 upload) — triggers image rendering and caches images
2. **Step 5**: Query database for negative groundball_b — **STOP if any found**
3. **Step 6**: Run `upload s3` — uploads the already-cached (validated) images. Fast because images are cached from step 4.
**Never skip step 5.** Broken cards uploaded to S3 affect all players immediately.
### Step 5 Validation Script
There is no CLI command for this validation yet. Run this Python script via `uv run python -c`:
```python
uv run python -c "
from db_calls import db_get
import asyncio
async def check_cards():
result = await db_get('battingcards', params=[('cardset', CARDSET_ID)])
cards = result.get('cards', [])
errors = []
for card in cards:
player = card.get('player', {})
pid = player.get('player_id', card.get('id'))
gb = card.get('groundball_b')
if gb is not None and gb < 0:
errors.append(f'Player {pid}: groundball_b = {gb}')
for field in ['gb_b', 'fb_b', 'ld_b']:
val = card.get(field)
if val is not None and (val < 0 or val > 100):
errors.append(f'Player {pid}: {field} = {val}')
if errors:
print('ERRORS FOUND:')
print('\n'.join(errors))
print('\nDO NOT PROCEED — fix data and re-run step 2')
else:
print(f'Validation passed — {len(cards)} batting cards checked, no issues')
asyncio.run(check_cards())
"
```
**Note:** Replace `CARDSET_ID` with the actual cardset ID (e.g., 27). The API returns `{'count': N, 'cards': [...]}` — always use `result.get('cards', [])` to extract the card list.
---
## Architecture
- `retrosheet_data.py` processes Retrosheet play-by-play data, calculates ratings, POSTs to API
- API stores cards in production database; cards are rendered on-demand via URL
- nginx caches rendered card images by date parameter (`?d=YYYY-MM-DD`)
- All operations are idempotent and safe to re-run
**Data sources**: Retrosheet events CSV, Baseball Reference defense CSVs (`data-input/`), FanGraphs splits (if needed)
**Required input files**:
- `data-input/retrosheet/retrosheets_events_*.csv`
- `data-input/<cardset name>/defense_*.csv` (defense_c.csv, defense_1b.csv, etc.)
- `data-input/<cardset name>/pitching.csv`, `running.csv`
**Scouting output**: 4 CSVs in `scouting/``batting-basic.csv`, `batting-ratings.csv`, `pitching-basic.csv`, `pitching-ratings.csv`
---
## Common Issues
**"No players found" after successful run**: Wrong database environment, wrong CARDSET_ID, or DATE mismatch. Check `alt_database` in `db_calls.py`. For promos, ensure PROMO_INCLUSION_RETRO_IDS is populated.
**High DH count (50+ players)**: Defense calculation failed. Check defense CSVs exist and column names match (`tz_runs_total` not `tz_runs_outfield`). Re-run step 2 after fixing.
**S3 upload fails**: Check `~/.aws/credentials`, verify cards render at API URL manually, re-run (idempotent).
**"surplus of X.XX chances" / "Adding X.XX results"**: Normal rounding adjustments in card generation — informational, not errors.
---
## Players of the Month (PotM) Variant
PotM cards use the same retrosheet pipeline but with a narrower date range, a promo cardset, and a curated player list.
### Key Differences from Full Cardset
| Setting | Full Cardset | PotM |
|---------|-------------|------|
| `--description` | `Live` | `<Month> PotM` (e.g., `April PotM`) |
| `--cardset-id` | Live cardset (e.g., 27) | Promo cardset (e.g., 28) |
| `--start` / `--end` | Full season range | Single month (e.g., `20050401` - `20050430`) |
| `--min-pa-vl` / `--min-pa-vr` | 20 / 40 (auto) | 1 / 1 (auto when description != "Live") |
| Player filtering | All qualifying players | Only `PROMO_INCLUSION_RETRO_IDS` |
| Position updates | Yes | Skipped (promo players keep existing positions) |
### PotM Pre-Flight Checklist
1. **Choose players** — Typically 2 IF, 2 OF, 1 SP, 1 RP per league (AL/NL)
2. **Get Retro IDs** — Look up each player's `key_retro` (e.g., `rodra001` for A-Rod)
3. **Determine date range** — First and last day of the month in `YYYYMMDD` format
4. **Confirm promo cardset ID** — Usually a separate cardset from the live one
### PotM Steps
```bash
# 1. Dry-run to verify config
pd-cards retrosheet process <year> -c <promo_cardset_id> \
-d "<Month> PotM" \
--start <YYYYMMDD> --end <YYYYMMDD> \
--dry-run
# 2. Generate promo cards
pd-cards retrosheet process <year> -c <promo_cardset_id> \
-d "<Month> PotM" \
--start <YYYYMMDD> --end <YYYYMMDD>
# 3. Validate (expect higher DH count — promo players may lack defense data for short windows)
pd-cards retrosheet validate <promo_cardset_id>
# 4-5. Image validation (same as full cardset — check, validate groundball_b, then upload)
pd-cards upload check -c "<promo cardset name>"
# Run groundball_b validation (step 5 from main workflow)
pd-cards upload s3 -c "<promo cardset name>"
# 6-7. Scouting reports — ALWAYS regenerate for ALL cardsets (no --cardset-id filter)
pd-cards scouting all
pd-cards scouting upload
```
### PotM-Specific Gotchas
- **`PROMO_INCLUSION_RETRO_IDS` must be populated** — If description is not "Live", retrosheet_data.py filters to only these IDs. Empty list = 0 players generated.
- **Don't mix Live and PotM** — If `PROMO_INCLUSION_RETRO_IDS` has entries but description is "Live", the script warns and exits.
- **Description protection** — Once a player has a PotM description (e.g., "April PotM"), it is never overwritten by subsequent live series runs. Promo cardset descriptions are also protected: existing cards keep their original month.
- **Scouting must cover ALL cardsets** — PotM players appear in scouting alongside live players. Always run `pd-cards scouting all` without `--cardset-id` to avoid overwriting the unified scouting data with partial results.
### Example: May 2005 PotM
```bash
# Players: A-Rod (IF), Delgado (IF), Mench (OF), Abreu (OF), Colon (SP), Ryan (RP), Harang (SP), Hoffman (RP)
# Retro IDs configured in retrosheet_data.py PROMO_INCLUSION_RETRO_IDS
pd-cards retrosheet process 2005 -c 28 -d "May PotM" --start 20050501 --end 20050531 --dry-run
pd-cards retrosheet process 2005 -c 28 -d "May PotM" --start 20050501 --end 20050531
pd-cards retrosheet validate 28
pd-cards upload check -c "2005 Promos"
# Run groundball_b validation
pd-cards upload s3 -c "2005 Promos"
pd-cards scouting all
pd-cards scouting upload
```
---
**Last Updated**: 2026-02-15
**Version**: 3.2 (Fixed scouting commands to use CLI, fixed groundball_b validation script, added CLI parameter reference and example)