paper-dynasty-card-creation/docs/prd-evolution/PHASE0_PROJECT_PLAN.md
Cal Corum 81622cceb3 docs: update Phase 0 status with PR references
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 10:13:18 -05:00

15 KiB

Phase 0 — Render Pipeline Optimization: Project Plan

Version: 1.1 Date: 2026-03-13 PRD Reference: docs/prd-evolution/02-architecture.md § Card Render Pipeline Optimization, 13-implementation.md § Phase 0 Status: Complete — deployed to dev (PR #94), client-side concurrent uploads merged via PR #28 (card-creation repo)


Overview

Phase 0 is independent of Card Evolution and benefits all existing card workflows immediately. The goal is to reduce per-card render time and full cardset uploads significantly by eliminating browser spawn overhead, CDN dependencies, and sequential processing.

Bottlenecks addressed:

  1. New Chromium process spawned per render request (~1.0-1.5s overhead)
  2. Google Fonts CDN fetched over network on every render (~0.3-0.5s) — no persistent cache since browser is destroyed after each render
  3. Upload pipeline is fully sequential — one card at a time, blocking S3 upload via synchronous boto3

Results:

Metric Before Target Actual
Per-card render (fresh) ~2.0s (benchmark avg) <1.0s ~0.98s avg (range 0.63-1.44s, ~51% reduction)
Per-card render (cached) N/A ~0.1s
External dependencies during render Google Fonts CDN None None
Chromium processes per 800-card run 800 1 1
800-card upload (sequential, estimated) ~27 min ~8-13 min ~13 min (estimated at 0.98s/card)
800-card upload (concurrent 8x, estimated) N/A ~2-4 min ~2-3 min (estimated)

Benchmark details (7 fresh renders on dev, 2026-03-13):

Player Type Time
Michael Young (12726) Batting 0.96s
Darin Erstad (12729) Batting 0.78s
Wilson Valdez (12746) Batting 1.44s
Player 12750 Batting 0.76s
Jarrod Washburn (12880) Pitching 0.63s
Ryan Drese (12879) Pitching 1.25s
Player 12890 Pitching 1.07s

Average: 0.98s — meets the <1s target. Occasional spikes to ~1.4s from Chromium GC pressure. Pitching cards tend to render slightly faster due to less template data.

Optimization breakdown:

  • Persistent browser (WP-02): eliminated ~1.0s spawn overhead
  • Variable font deduplication (WP-01 fix): eliminated ~163KB redundant base64 parsing, saved ~0.4s
  • Remaining ~0.98s is Playwright page creation, HTML parsing, and PNG screenshot — not reducible without GPU acceleration or a different rendering approach

Work Packages (6 WPs)

WP-00: Baseline Benchmarks

Repo: database + card-creation Complexity: XS Dependencies: None

Capture before-metrics so we can measure improvement.

Tasks

  1. Time 10 sequential card renders via the API (curl with timing)
  2. Time a small batch S3 upload (e.g., 20 cards) via pd-cards upload
  3. Record results in a benchmark log

Tests

  • Benchmark script or documented curl commands exist and are repeatable

Acceptance Criteria

  1. Baseline numbers recorded for per-card render time
  2. Baseline numbers recorded for batch upload time
  3. Methodology is repeatable for post-optimization comparison

WP-01: Self-Hosted Fonts

Repo: database Complexity: S Dependencies: None (can run in parallel with WP-02)

Replace Google Fonts CDN with locally embedded WOFF2 fonts. Eliminates ~0.3-0.5s network round-trip per render and removes external dependency.

Current State

  • storage/templates/player_card.html lines 5-7: <link> tags to fonts.googleapis.com
  • storage/templates/style.html: References "Open Sans" and "Source Sans 3" font-families
  • Two fonts used: Open Sans (300, 400, 700) and Source Sans 3 (400, 700)

Implementation

  1. Download WOFF2 files for both fonts (5 files total: Open Sans 300/400/700, Source Sans 3 400/700)
  2. Base64-encode each WOFF2 file
  3. Add @font-face declarations with base64 data URIs to style.html
  4. Remove the three <link> tags from player_card.html
  5. Visual diff: render the same card before/after and verify identical output

Files

  • Create: database/storage/fonts/ directory with raw WOFF2 files (source archive, not deployed)
  • Modify: database/storage/templates/style.html — add @font-face declarations
  • Modify: database/storage/templates/player_card.html — remove <link> tags (lines 5-7)

Tests

  • Unit: style.html contains no fonts.googleapis.com references
  • Unit: player_card.html contains no <link> to external font CDNs
  • Unit: @font-face declarations present for all 5 font variants
  • Visual: rendered card is pixel-identical to pre-change output (manual check)

Acceptance Criteria

  1. No external network requests during card render
  2. All 5 font weights render correctly
  3. Card appearance unchanged

WP-02: Persistent Browser Instance

Repo: database Complexity: M Dependencies: None (can run in parallel with WP-01)

Replace per-request Chromium launch/teardown with a persistent browser that lives for the lifetime of the API process. Eliminates ~1.0-1.5s spawn overhead per render.

Current State

  • app/routers_v2/players.py lines 801-826: async with async_playwright() as p: block creates and destroys a browser per request
  • No browser reuse, no connection pooling

Implementation

  1. Add module-level _browser and _playwright globals to players.py
  2. Implement get_browser() — lazy-init with is_connected() auto-reconnect
  3. Implement shutdown_browser() — clean teardown for API shutdown
  4. Replace the async with async_playwright() block with page-per-request pattern:
    browser = await get_browser()
    page = await browser.new_page(viewport={"width": 1280, "height": 720})
    try:
        await page.set_content(html_string)
        await page.screenshot(path=file_path, type="png", clip={...})
    finally:
        await page.close()
    
  5. Ensure page is always closed in finally block to prevent memory leaks

Files

  • Modify: database/app/routers_v2/players.py — persistent browser, page-per-request

Tests

  • Unit: get_browser() returns a connected browser
  • Unit: get_browser() returns same instance on second call
  • Unit: get_browser() relaunches if browser disconnected
  • Integration: render 10 cards sequentially, no browser leaks (page count returns to 0 between renders)
  • Integration: concurrent renders (4 simultaneous requests) complete without errors
  • Integration: shutdown_browser() cleanly closes browser and playwright

Acceptance Criteria

  1. Only 1 Chromium process running regardless of render count
  2. Page count returns to 0 between renders (no leaks)
  3. Auto-reconnect works if browser crashes
  4. Per-card render time drops to ~1.0-1.5s Actual: ~0.98s avg fresh render (from ~2.0s baseline) — target met

WP-03: FastAPI Lifespan Hooks

Repo: database Complexity: S Dependencies: WP-02

Wire get_browser() and shutdown_browser() into FastAPI's lifespan so the browser warms up on startup and cleans up on shutdown.

Current State

  • app/main.py line 54: plain FastAPI(...) constructor with no lifespan
  • Only middleware is the DB session handler (lines 97-105)

Implementation

  1. Add @asynccontextmanager lifespan function that calls get_browser() on startup and shutdown_browser() on shutdown
  2. Pass lifespan=lifespan to FastAPI() constructor
  3. Verify existing middleware is unaffected

Files

  • Modify: database/app/main.py — add lifespan hook, pass to FastAPI constructor
  • Modify: database/app/routers_v2/players.py — export get_browser/shutdown_browser (if not already importable)

Tests

  • Integration: browser is connected immediately after API startup (before any render request)
  • Integration: browser is closed after API shutdown (no orphan processes)
  • Integration: existing DB middleware still functions correctly
  • Integration: API health endpoint still responds

Acceptance Criteria

  1. Browser pre-warmed on startup — first render request has no cold-start penalty
  2. Clean shutdown — no orphan Chromium processes after API stop
  3. No regression in existing API behavior

WP-04: Concurrent Upload Pipeline

Repo: card-creation Complexity: M Dependencies: WP-02 (persistent browser must be deployed for concurrent renders to work)

Replace the sequential upload loop with semaphore-bounded asyncio.gather for parallel card fetching, rendering, and S3 upload.

Current State

  • pd_cards/core/upload.py upload_cards_to_s3() (lines 109-333): sequential for x in all_players: loop
  • fetch_card_image timeout hardcoded to 6s (line 28)
  • upload_card_to_s3() uses synchronous boto3.put_object — blocks the event loop
  • Single aiohttp.ClientSession is reused (good)

Implementation

  1. Wrap per-card processing in an async def process_card(player) coroutine
  2. Add asyncio.Semaphore(concurrency) guard (default concurrency=8)
  3. Replace sequential loop with asyncio.gather(*[process_card(p) for p in all_players], return_exceptions=True)
  4. Offload synchronous upload_card_to_s3() to thread pool via asyncio.get_event_loop().run_in_executor(None, upload_card_to_s3, ...)
  5. Increase fetch_card_image timeout from 6s to 10s
  6. Add error handling: individual card failures logged but don't abort the batch
  7. Add progress reporting: log completion count every N cards (not every start)
  8. Add --concurrency CLI argument to pd-cards upload command

Files

  • Modify: pd_cards/core/upload.py — concurrent pipeline, timeout increase
  • Modify: pd_cards/cli/upload.py (or wherever CLI args are defined) — add --concurrency flag

Tests

  • Unit: semaphore limits concurrent tasks to specified count
  • Unit: individual card failure doesn't abort batch (return_exceptions=True)
  • Unit: progress logging fires at correct intervals
  • Integration: 20-card concurrent upload completes successfully
  • Integration: S3 URLs are correct after concurrent upload
  • Integration: --concurrency 1 behaves like sequential (regression safety)

Acceptance Criteria

  1. Default concurrency of 8 parallel card processes
  2. Individual failures logged, don't abort batch
  3. fetch_card_image timeout is 10s
  4. 800-card upload estimated at ~3-4 minutes with 8x concurrency (with WP-01 + WP-02 deployed)
  5. --concurrency flag available on CLI

WP-05: Legacy Upload Script Update

Repo: card-creation Complexity: S Dependencies: WP-04

Apply the same concurrency pattern to check_cards_and_upload.py for users who still use the legacy script.

Current State

  • check_cards_and_upload.py lines 150-293: identical sequential pattern to pd_cards/core/upload.py
  • Module-level boto3 client (line 27)

Implementation

  1. Refactor the sequential loop to use asyncio.gather + Semaphore (same pattern as WP-04)
  2. Offload synchronous S3 calls to thread pool
  3. Increase fetch timeout to 10s
  4. Add progress reporting

Files

  • Modify: check_cards_and_upload.py

Tests

  • Integration: legacy script uploads 10 cards concurrently without errors
  • Integration: S3 URLs match expected format

Acceptance Criteria

  1. Same concurrency behavior as WP-04
  2. No regression in existing functionality

WP Summary

WP Title Repo Size Dependencies Tests
WP-00 Baseline Benchmarks both XS 1
WP-01 Self-Hosted Fonts database S 4
WP-02 Persistent Browser Instance database M 6
WP-03 FastAPI Lifespan Hooks database S WP-02 4
WP-04 Concurrent Upload Pipeline card-creation M WP-02 6
WP-05 Legacy Upload Script Update card-creation S WP-04 2

Total: 6 WPs, ~23 tests


Dependency Graph

WP-00 (benchmarks)
  |
  v
WP-01 (fonts) ──────┐
                     ├──> WP-03 (lifespan) ──> Deploy to dev ──> WP-04 (concurrent upload)
WP-02 (browser) ────┘                                              |
                                                                    v
                                                              WP-05 (legacy script)
                                                                    |
                                                                    v
                                                              Re-run benchmarks

Parallelization:

  • WP-00, WP-01, WP-02 can all start immediately in parallel
  • WP-03 needs WP-02
  • WP-04 needs WP-02 deployed (persistent browser must be running server-side for concurrent fetches to work)
  • WP-05 needs WP-04 (reuse the pattern)

Risks

Risk Likelihood Impact Mitigation
Base64-embedded fonts bloat template HTML Medium Low WOFF2 files are small (~20-40KB each). Total ~150KB base64 added to template. Acceptable since template is loaded once into Playwright, not transmitted to clients.
Persistent browser memory leak Medium Medium Always close pages in finally block. Monitor RSS after sustained renders. Add is_connected() check for crash recovery.
Concurrent renders overload API server Low High Semaphore bounds concurrency. Start at 8, tune based on server RAM (~100MB per page). 8 pages = ~800MB, well within 16GB.
Synchronous boto3 blocks event loop under concurrency Medium Medium Use run_in_executor to offload to thread pool. Consider aioboto3 if thread pool proves insufficient.
Visual regression from font change Low High Visual diff test before/after. Render same card with both approaches and compare pixel output.

Open Questions

None — Phase 0 is straightforward infrastructure optimization with no design decisions pending.


Follow-On: Local High-Concurrency Rendering (2026-03-14)

After Phase 0 was deployed, a follow-on improvement was implemented: configurable API URL for card rendering. This enables running the Paper Dynasty API server locally on the workstation and pointing upload scripts at localhost for dramatically higher concurrency.

Changes

  • pd_cards/core/upload.pyupload_cards_to_s3(), refresh_card_images(), check_card_images() accept api_url parameter (defaults to production)
  • pd_cards/commands/upload.py--api-url CLI option on upload s3 command
  • check_cards_and_upload.pyPD_API_URL env var override (legacy script)

Expected Performance

Scenario Per-card 800 cards
Remote server, 8x concurrency (current) ~0.98s render + network ~2-3 min
Local server, 32x concurrency ~0.98s render, 32 parallel ~30-45 sec

Usage

pd-cards upload s3 --cardset "2005 Live" --api-url http://localhost:8000/api --concurrency 32

Notes

  • Phase 0 is a prerequisite for Phase 4 (Animated Cosmetics) which needs the persistent browser for efficient multi-frame APNG capture
  • The persistent browser also benefits Phase 2/3 variant rendering
  • GPU acceleration was evaluated and rejected — see PRD 02-architecture.md § Optimization 4
  • Consider aioboto3 as a future enhancement if run_in_executor thread pool becomes a bottleneck