Allow upload scripts to target a local API server instead of the remote production server, enabling 32x+ concurrency for dramatically faster full-cardset uploads (~30-45s vs ~2-3min for 800 cards). - pd_cards/core/upload.py: add api_url param to upload_cards_to_s3(), refresh_card_images(), and check_card_images() - pd_cards/commands/upload.py: add --api-url CLI option to upload s3 - check_cards_and_upload.py: read PD_API_URL env var with prod fallback - Update CLAUDE.md, CLI reference, and Phase 0 project plan docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 KiB
Phase 0 — Render Pipeline Optimization: Project Plan
Version: 1.1
Date: 2026-03-13
PRD Reference: docs/prd-evolution/02-architecture.md § Card Render Pipeline Optimization, 13-implementation.md § Phase 0
Status: Implemented — deployed to dev, PR #94 open for production
Overview
Phase 0 is independent of Card Evolution and benefits all existing card workflows immediately. The goal is to reduce per-card render time and full cardset uploads significantly by eliminating browser spawn overhead, CDN dependencies, and sequential processing.
Bottlenecks addressed:
- New Chromium process spawned per render request (~1.0-1.5s overhead)
- Google Fonts CDN fetched over network on every render (~0.3-0.5s) — no persistent cache since browser is destroyed after each render
- Upload pipeline is fully sequential — one card at a time, blocking S3 upload via synchronous boto3
Results:
| Metric | Before | Target | Actual |
|---|---|---|---|
| Per-card render (fresh) | ~2.0s (benchmark avg) | <1.0s | ~0.98s avg (range 0.63-1.44s, ~51% reduction) |
| Per-card render (cached) | N/A | — | ~0.1s |
| External dependencies during render | Google Fonts CDN | None | None |
| Chromium processes per 800-card run | 800 | 1 | 1 |
| 800-card upload (sequential, estimated) | ~27 min | ~8-13 min | ~13 min (estimated at 0.98s/card) |
| 800-card upload (concurrent 8x, estimated) | N/A | ~2-4 min | ~2-3 min (estimated) |
Benchmark details (7 fresh renders on dev, 2026-03-13):
| Player | Type | Time |
|---|---|---|
| Michael Young (12726) | Batting | 0.96s |
| Darin Erstad (12729) | Batting | 0.78s |
| Wilson Valdez (12746) | Batting | 1.44s |
| Player 12750 | Batting | 0.76s |
| Jarrod Washburn (12880) | Pitching | 0.63s |
| Ryan Drese (12879) | Pitching | 1.25s |
| Player 12890 | Pitching | 1.07s |
Average: 0.98s — meets the <1s target. Occasional spikes to ~1.4s from Chromium GC pressure. Pitching cards tend to render slightly faster due to less template data.
Optimization breakdown:
- Persistent browser (WP-02): eliminated ~1.0s spawn overhead
- Variable font deduplication (WP-01 fix): eliminated ~163KB redundant base64 parsing, saved ~0.4s
- Remaining ~0.98s is Playwright page creation, HTML parsing, and PNG screenshot — not reducible without GPU acceleration or a different rendering approach
Work Packages (6 WPs)
WP-00: Baseline Benchmarks
Repo: database + card-creation
Complexity: XS
Dependencies: None
Capture before-metrics so we can measure improvement.
Tasks
- Time 10 sequential card renders via the API (curl with timing)
- Time a small batch S3 upload (e.g., 20 cards) via
pd-cards upload - Record results in a benchmark log
Tests
- Benchmark script or documented curl commands exist and are repeatable
Acceptance Criteria
- Baseline numbers recorded for per-card render time
- Baseline numbers recorded for batch upload time
- Methodology is repeatable for post-optimization comparison
WP-01: Self-Hosted Fonts
Repo: database
Complexity: S
Dependencies: None (can run in parallel with WP-02)
Replace Google Fonts CDN with locally embedded WOFF2 fonts. Eliminates ~0.3-0.5s network round-trip per render and removes external dependency.
Current State
storage/templates/player_card.htmllines 5-7:<link>tags tofonts.googleapis.comstorage/templates/style.html: References"Open Sans"and"Source Sans 3"font-families- Two fonts used: Open Sans (300, 400, 700) and Source Sans 3 (400, 700)
Implementation
- Download WOFF2 files for both fonts (5 files total: Open Sans 300/400/700, Source Sans 3 400/700)
- Base64-encode each WOFF2 file
- Add
@font-facedeclarations with base64 data URIs tostyle.html - Remove the three
<link>tags fromplayer_card.html - Visual diff: render the same card before/after and verify identical output
Files
- Create:
database/storage/fonts/directory with raw WOFF2 files (source archive, not deployed) - Modify:
database/storage/templates/style.html— add@font-facedeclarations - Modify:
database/storage/templates/player_card.html— remove<link>tags (lines 5-7)
Tests
- Unit:
style.htmlcontains nofonts.googleapis.comreferences - Unit:
player_card.htmlcontains no<link>to external font CDNs - Unit:
@font-facedeclarations present for all 5 font variants - Visual: rendered card is pixel-identical to pre-change output (manual check)
Acceptance Criteria
- No external network requests during card render
- All 5 font weights render correctly
- Card appearance unchanged
WP-02: Persistent Browser Instance
Repo: database
Complexity: M
Dependencies: None (can run in parallel with WP-01)
Replace per-request Chromium launch/teardown with a persistent browser that lives for the lifetime of the API process. Eliminates ~1.0-1.5s spawn overhead per render.
Current State
app/routers_v2/players.pylines 801-826:async with async_playwright() as p:block creates and destroys a browser per request- No browser reuse, no connection pooling
Implementation
- Add module-level
_browserand_playwrightglobals toplayers.py - Implement
get_browser()— lazy-init withis_connected()auto-reconnect - Implement
shutdown_browser()— clean teardown for API shutdown - Replace the
async with async_playwright()block with page-per-request pattern:browser = await get_browser() page = await browser.new_page(viewport={"width": 1280, "height": 720}) try: await page.set_content(html_string) await page.screenshot(path=file_path, type="png", clip={...}) finally: await page.close() - Ensure page is always closed in
finallyblock to prevent memory leaks
Files
- Modify:
database/app/routers_v2/players.py— persistent browser, page-per-request
Tests
- Unit:
get_browser()returns a connected browser - Unit:
get_browser()returns same instance on second call - Unit:
get_browser()relaunches if browser disconnected - Integration: render 10 cards sequentially, no browser leaks (page count returns to 0 between renders)
- Integration: concurrent renders (4 simultaneous requests) complete without errors
- Integration:
shutdown_browser()cleanly closes browser and playwright
Acceptance Criteria
- Only 1 Chromium process running regardless of render count
- Page count returns to 0 between renders (no leaks)
- Auto-reconnect works if browser crashes
Per-card render time drops to ~1.0-1.5sActual: ~0.98s avg fresh render (from ~2.0s baseline) — target met
WP-03: FastAPI Lifespan Hooks
Repo: database
Complexity: S
Dependencies: WP-02
Wire get_browser() and shutdown_browser() into FastAPI's lifespan so the browser warms up on startup and cleans up on shutdown.
Current State
app/main.pyline 54: plainFastAPI(...)constructor with no lifespan- Only middleware is the DB session handler (lines 97-105)
Implementation
- Add
@asynccontextmanagerlifespan function that callsget_browser()on startup andshutdown_browser()on shutdown - Pass
lifespan=lifespantoFastAPI()constructor - Verify existing middleware is unaffected
Files
- Modify:
database/app/main.py— add lifespan hook, pass to FastAPI constructor - Modify:
database/app/routers_v2/players.py— exportget_browser/shutdown_browser(if not already importable)
Tests
- Integration: browser is connected immediately after API startup (before any render request)
- Integration: browser is closed after API shutdown (no orphan processes)
- Integration: existing DB middleware still functions correctly
- Integration: API health endpoint still responds
Acceptance Criteria
- Browser pre-warmed on startup — first render request has no cold-start penalty
- Clean shutdown — no orphan Chromium processes after API stop
- No regression in existing API behavior
WP-04: Concurrent Upload Pipeline
Repo: card-creation
Complexity: M
Dependencies: WP-02 (persistent browser must be deployed for concurrent renders to work)
Replace the sequential upload loop with semaphore-bounded asyncio.gather for parallel card fetching, rendering, and S3 upload.
Current State
pd_cards/core/upload.pyupload_cards_to_s3()(lines 109-333): sequentialfor x in all_players:loopfetch_card_imagetimeout hardcoded to 6s (line 28)upload_card_to_s3()uses synchronousboto3.put_object— blocks the event loop- Single
aiohttp.ClientSessionis reused (good)
Implementation
- Wrap per-card processing in an
async def process_card(player)coroutine - Add
asyncio.Semaphore(concurrency)guard (default concurrency=8) - Replace sequential loop with
asyncio.gather(*[process_card(p) for p in all_players], return_exceptions=True) - Offload synchronous
upload_card_to_s3()to thread pool viaasyncio.get_event_loop().run_in_executor(None, upload_card_to_s3, ...) - Increase
fetch_card_imagetimeout from 6s to 10s - Add error handling: individual card failures logged but don't abort the batch
- Add progress reporting: log completion count every N cards (not every start)
- Add
--concurrencyCLI argument topd-cards uploadcommand
Files
- Modify:
pd_cards/core/upload.py— concurrent pipeline, timeout increase - Modify:
pd_cards/cli/upload.py(or wherever CLI args are defined) — add--concurrencyflag
Tests
- Unit: semaphore limits concurrent tasks to specified count
- Unit: individual card failure doesn't abort batch (return_exceptions=True)
- Unit: progress logging fires at correct intervals
- Integration: 20-card concurrent upload completes successfully
- Integration: S3 URLs are correct after concurrent upload
- Integration:
--concurrency 1behaves like sequential (regression safety)
Acceptance Criteria
- Default concurrency of 8 parallel card processes
- Individual failures logged, don't abort batch
fetch_card_imagetimeout is 10s- 800-card upload estimated at ~3-4 minutes with 8x concurrency (with WP-01 + WP-02 deployed)
--concurrencyflag available on CLI
WP-05: Legacy Upload Script Update
Repo: card-creation
Complexity: S
Dependencies: WP-04
Apply the same concurrency pattern to check_cards_and_upload.py for users who still use the legacy script.
Current State
check_cards_and_upload.pylines 150-293: identical sequential pattern topd_cards/core/upload.py- Module-level boto3 client (line 27)
Implementation
- Refactor the sequential loop to use
asyncio.gather+Semaphore(same pattern as WP-04) - Offload synchronous S3 calls to thread pool
- Increase fetch timeout to 10s
- Add progress reporting
Files
- Modify:
check_cards_and_upload.py
Tests
- Integration: legacy script uploads 10 cards concurrently without errors
- Integration: S3 URLs match expected format
Acceptance Criteria
- Same concurrency behavior as WP-04
- No regression in existing functionality
WP Summary
| WP | Title | Repo | Size | Dependencies | Tests |
|---|---|---|---|---|---|
| WP-00 | Baseline Benchmarks | both | XS | — | 1 |
| WP-01 | Self-Hosted Fonts | database | S | — | 4 |
| WP-02 | Persistent Browser Instance | database | M | — | 6 |
| WP-03 | FastAPI Lifespan Hooks | database | S | WP-02 | 4 |
| WP-04 | Concurrent Upload Pipeline | card-creation | M | WP-02 | 6 |
| WP-05 | Legacy Upload Script Update | card-creation | S | WP-04 | 2 |
Total: 6 WPs, ~23 tests
Dependency Graph
WP-00 (benchmarks)
|
v
WP-01 (fonts) ──────┐
├──> WP-03 (lifespan) ──> Deploy to dev ──> WP-04 (concurrent upload)
WP-02 (browser) ────┘ |
v
WP-05 (legacy script)
|
v
Re-run benchmarks
Parallelization:
- WP-00, WP-01, WP-02 can all start immediately in parallel
- WP-03 needs WP-02
- WP-04 needs WP-02 deployed (persistent browser must be running server-side for concurrent fetches to work)
- WP-05 needs WP-04 (reuse the pattern)
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Base64-embedded fonts bloat template HTML | Medium | Low | WOFF2 files are small (~20-40KB each). Total ~150KB base64 added to template. Acceptable since template is loaded once into Playwright, not transmitted to clients. |
| Persistent browser memory leak | Medium | Medium | Always close pages in finally block. Monitor RSS after sustained renders. Add is_connected() check for crash recovery. |
| Concurrent renders overload API server | Low | High | Semaphore bounds concurrency. Start at 8, tune based on server RAM (~100MB per page). 8 pages = ~800MB, well within 16GB. |
| Synchronous boto3 blocks event loop under concurrency | Medium | Medium | Use run_in_executor to offload to thread pool. Consider aioboto3 if thread pool proves insufficient. |
| Visual regression from font change | Low | High | Visual diff test before/after. Render same card with both approaches and compare pixel output. |
Open Questions
None — Phase 0 is straightforward infrastructure optimization with no design decisions pending.
Follow-On: Local High-Concurrency Rendering (2026-03-14)
After Phase 0 was deployed, a follow-on improvement was implemented: configurable API URL for card rendering. This enables running the Paper Dynasty API server locally on the workstation and pointing upload scripts at localhost for dramatically higher concurrency.
Changes
pd_cards/core/upload.py—upload_cards_to_s3(),refresh_card_images(),check_card_images()acceptapi_urlparameter (defaults to production)pd_cards/commands/upload.py—--api-urlCLI option onupload s3commandcheck_cards_and_upload.py—PD_API_URLenv var override (legacy script)
Expected Performance
| Scenario | Per-card | 800 cards |
|---|---|---|
| Remote server, 8x concurrency (current) | ~0.98s render + network | ~2-3 min |
| Local server, 32x concurrency | ~0.98s render, 32 parallel | ~30-45 sec |
Usage
pd-cards upload s3 --cardset "2005 Live" --api-url http://localhost:8000/api --concurrency 32
Notes
- Phase 0 is a prerequisite for Phase 4 (Animated Cosmetics) which needs the persistent browser for efficient multi-frame APNG capture
- The persistent browser also benefits Phase 2/3 variant rendering
- GPU acceleration was evaluated and rejected — see PRD
02-architecture.md§ Optimization 4 - Consider
aioboto3as a future enhancement ifrun_in_executorthread pool becomes a bottleneck