WP-04: Concurrent Upload Pipeline #91

Closed
opened 2026-03-13 04:37:20 +00:00 by cal · 2 comments
Owner

Description

Replace the sequential upload loop with semaphore-bounded asyncio.gather for parallel card fetching, rendering, and S3 upload.

Repo: card-creation
Phase: 0 (Render Pipeline Optimization)
Dependencies: WP-02 (persistent browser must be deployed for concurrent renders to work)
Complexity: M

Current State

  • pd_cards/core/upload.py upload_cards_to_s3() (lines 109-333): sequential for x in all_players: loop
  • fetch_card_image timeout hardcoded to 6s (line 28)
  • upload_card_to_s3() uses synchronous boto3.put_object — blocks the event loop
  • Single aiohttp.ClientSession is reused (good)

Implementation

  1. Wrap per-card processing in an async def process_card(player) coroutine
  2. Add asyncio.Semaphore(concurrency) guard (default concurrency=8)
  3. Replace sequential loop with asyncio.gather(*[process_card(p) for p in all_players], return_exceptions=True)
  4. Offload synchronous upload_card_to_s3() to thread pool via asyncio.get_event_loop().run_in_executor(None, upload_card_to_s3, ...)
  5. Increase fetch_card_image timeout from 6s to 10s
  6. Add error handling: individual card failures logged but don't abort the batch
  7. Add progress reporting: log completion count every N cards (not every start)
  8. Add --concurrency CLI argument to pd-cards upload command

Files

  • Modify: pd_cards/core/upload.py — concurrent pipeline, timeout increase
  • Modify: pd_cards/cli/upload.py (or wherever CLI args are defined) — add --concurrency flag

Tests

  • Unit: semaphore limits concurrent tasks to specified count
  • Unit: individual card failure doesn't abort batch (return_exceptions=True)
  • Unit: progress logging fires at correct intervals
  • Integration: 20-card concurrent upload completes successfully
  • Integration: S3 URLs are correct after concurrent upload
  • Integration: --concurrency 1 behaves like sequential (regression safety)

Acceptance Criteria

  1. Default concurrency of 8 parallel card processes
  2. Individual failures logged, don't abort batch
  3. fetch_card_image timeout is 10s
  4. 800-card upload completes in <5 minutes (with WP-01 + WP-02 deployed)
  5. --concurrency flag available on CLI

Plan reference: docs/prd-evolution/PHASE0_PROJECT_PLAN.md WP-04

## Description Replace the sequential upload loop with semaphore-bounded `asyncio.gather` for parallel card fetching, rendering, and S3 upload. **Repo:** `card-creation` **Phase:** 0 (Render Pipeline Optimization) **Dependencies:** WP-02 (persistent browser must be deployed for concurrent renders to work) **Complexity:** M ## Current State - `pd_cards/core/upload.py` `upload_cards_to_s3()` (lines 109-333): sequential `for x in all_players:` loop - `fetch_card_image` timeout hardcoded to 6s (line 28) - `upload_card_to_s3()` uses synchronous `boto3.put_object` — blocks the event loop - Single `aiohttp.ClientSession` is reused (good) ## Implementation 1. Wrap per-card processing in an `async def process_card(player)` coroutine 2. Add `asyncio.Semaphore(concurrency)` guard (default concurrency=8) 3. Replace sequential loop with `asyncio.gather(*[process_card(p) for p in all_players], return_exceptions=True)` 4. Offload synchronous `upload_card_to_s3()` to thread pool via `asyncio.get_event_loop().run_in_executor(None, upload_card_to_s3, ...)` 5. Increase `fetch_card_image` timeout from 6s to 10s 6. Add error handling: individual card failures logged but don't abort the batch 7. Add progress reporting: log completion count every N cards (not every start) 8. Add `--concurrency` CLI argument to `pd-cards upload` command ## Files - **Modify:** `pd_cards/core/upload.py` — concurrent pipeline, timeout increase - **Modify:** `pd_cards/cli/upload.py` (or wherever CLI args are defined) — add `--concurrency` flag ## Tests - [ ] Unit: semaphore limits concurrent tasks to specified count - [ ] Unit: individual card failure doesn't abort batch (return_exceptions=True) - [ ] Unit: progress logging fires at correct intervals - [ ] Integration: 20-card concurrent upload completes successfully - [ ] Integration: S3 URLs are correct after concurrent upload - [ ] Integration: `--concurrency 1` behaves like sequential (regression safety) ## Acceptance Criteria 1. Default concurrency of 8 parallel card processes 2. Individual failures logged, don't abort batch 3. `fetch_card_image` timeout is 10s 4. 800-card upload completes in <5 minutes (with WP-01 + WP-02 deployed) 5. `--concurrency` flag available on CLI **Plan reference:** `docs/prd-evolution/PHASE0_PROJECT_PLAN.md` WP-04
cal added this to the Card Evolution Phase 0 — Render Pipeline Optimization milestone 2026-03-13 04:37:37 +00:00
cal added the
evolution
phase-0
labels 2026-03-13 04:37:42 +00:00
Claude added the
ai-working
label 2026-03-13 11:01:51 +00:00
Collaborator

PR opened in paper-dynasty-card-creation: cal/paper-dynasty-card-creation#25

The implementation was already complete on feature/render-pipeline-optimization. The PR surfaces it as a reviewable branch targeting main.

What was implemented:

  • pd_cards/core/upload.py: process_single_card() coroutine + asyncio.Semaphore(concurrency) + asyncio.gather(*tasks, return_exceptions=True) + loop.run_in_executor for boto3 + timeout 6s → 10s + progress every 20 cards
  • pd_cards/commands/upload.py: --concurrency / -j flag (default: 8)
  • check_cards_and_upload.py: same pattern applied (WP-05)
  • scripts/benchmark_render.sh + baseline results (WP-00)
PR opened in `paper-dynasty-card-creation`: https://git.manticorum.com/cal/paper-dynasty-card-creation/pulls/25 The implementation was already complete on `feature/render-pipeline-optimization`. The PR surfaces it as a reviewable branch targeting `main`. **What was implemented:** - `pd_cards/core/upload.py`: `process_single_card()` coroutine + `asyncio.Semaphore(concurrency)` + `asyncio.gather(*tasks, return_exceptions=True)` + `loop.run_in_executor` for boto3 + timeout 6s → 10s + progress every 20 cards - `pd_cards/commands/upload.py`: `--concurrency / -j` flag (default: 8) - `check_cards_and_upload.py`: same pattern applied (WP-05) - `scripts/benchmark_render.sh` + baseline results (WP-00)
Claude added
ai-pr-opened
and removed
ai-working
labels 2026-03-13 11:05:13 +00:00
Author
Owner

Completed. Concurrent upload pipeline shipped in card-creation repo:

  • 979f308feat: concurrent upload pipeline and benchmarks (Phase 0)
  • ed1daa2fix: use get_running_loop() instead of deprecated get_event_loop()
  • 8bddf31feat: configurable API URL for local high-concurrency card rendering

All acceptance criteria met:

  • Default concurrency of 8 parallel card processes
  • Individual failures logged, don't abort batch
  • fetch_card_image timeout is 10s
  • --concurrency flag available on CLI
  • Follow-on: --api-url flag for local rendering at 32x concurrency
**Completed.** Concurrent upload pipeline shipped in `card-creation` repo: - `979f308` — `feat: concurrent upload pipeline and benchmarks (Phase 0)` - `ed1daa2` — `fix: use get_running_loop() instead of deprecated get_event_loop()` - `8bddf31` — `feat: configurable API URL for local high-concurrency card rendering` All acceptance criteria met: - Default concurrency of 8 parallel card processes ✅ - Individual failures logged, don't abort batch ✅ - `fetch_card_image` timeout is 10s ✅ - `--concurrency` flag available on CLI ✅ - Follow-on: `--api-url` flag for local rendering at 32x concurrency ✅
cal closed this issue 2026-03-16 16:03:47 +00:00
Sign in to join this conversation.
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/paper-dynasty-database#91
No description provided.