feat: concurrent upload pipeline for legacy script (WP-05) (#92) #26

Closed
Claude wants to merge 1 commits from ai/paper-dynasty-database#92 into main
Collaborator

Closes cal/paper-dynasty-database#92

Summary

Refactors check_cards_and_upload.py to use the same concurrent upload pattern as pd_cards/core/upload.py (WP-04).

Changes:

  • Add CONCURRENCY = 8 constant
  • Add import functools
  • Increase fetch_card_image default timeout from 6s → 10s
  • Replace sequential for x in all_players loop with asyncio.gather + asyncio.Semaphore(CONCURRENCY)
  • Offload synchronous boto3 S3 PUT calls to thread pool via loop.run_in_executor + functools.partial
  • Add report_progress() helper: prints/logs every 20 completions
  • Add results_lock / progress_lock for concurrent state mutation safety
  • Build filtered_players list before gathering (replaces inline filter logic in the loop)
  • Print concurrency count in startup banner

Files changed

  • check_cards_and_upload.py

Test plan

  • Integration: legacy script uploads 10 cards concurrently without errors
  • Integration: S3 URLs match expected format https://{bucket}.s3.{region}.amazonaws.com/cards/cardset-{NNN}/player-{id}/{type}card.png?d={date}

🤖 Generated with Claude Code

Closes cal/paper-dynasty-database#92 ## Summary Refactors `check_cards_and_upload.py` to use the same concurrent upload pattern as `pd_cards/core/upload.py` (WP-04). **Changes:** - Add `CONCURRENCY = 8` constant - Add `import functools` - Increase `fetch_card_image` default timeout from 6s → 10s - Replace sequential `for x in all_players` loop with `asyncio.gather` + `asyncio.Semaphore(CONCURRENCY)` - Offload synchronous `boto3` S3 PUT calls to thread pool via `loop.run_in_executor` + `functools.partial` - Add `report_progress()` helper: prints/logs every 20 completions - Add `results_lock` / `progress_lock` for concurrent state mutation safety - Build `filtered_players` list before gathering (replaces inline filter logic in the loop) - Print concurrency count in startup banner ## Files changed - `check_cards_and_upload.py` ## Test plan - [ ] Integration: legacy script uploads 10 cards concurrently without errors - [ ] Integration: S3 URLs match expected format `https://{bucket}.s3.{region}.amazonaws.com/cards/cardset-{NNN}/player-{id}/{type}card.png?d={date}` 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Claude added 1 commit 2026-03-13 13:05:41 +00:00
Refactor check_cards_and_upload.py sequential loop to use asyncio.gather
+ Semaphore (CONCURRENCY=8), offload boto3 S3 calls to thread pool via
loop.run_in_executor, increase fetch timeout to 10s, and add progress
reporting every 20 completions.

Closes #92

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner

Superseded by #27 which includes WP-00, WP-04, and WP-05 in a single commit with the same implementation. Closing as duplicate.

Superseded by #27 which includes WP-00, WP-04, and WP-05 in a single commit with the same implementation. Closing as duplicate.
cal closed this pull request 2026-03-13 13:49:03 +00:00
Owner

AI Code Review

Files Reviewed

  • check_cards_and_upload.py (modified)

Critical Context: Duplicate of Existing Work

This PR implements changes that are already present on the feature/render-pipeline-optimization branch (commit 979f308, "feat: concurrent upload pipeline and benchmarks (Phase 0)"). A byte-for-byte diff of check_cards_and_upload.py between this PR's head (0d603cc) and feature/render-pipeline-optimization produces no output — the files are identical.

The manual branch also covers WP-00 (benchmark script scripts/benchmark_render.sh) and WP-04 (concurrent pd_cards/core/upload.py + pd_cards/commands/upload.py), which this PR does not include. The PR was likely closed because the work was already present on feature/render-pipeline-optimization. Review notes are posted here for record.


Findings

Correctness

asyncio.get_event_loop() used inside a running coroutine (line 173)

loop = asyncio.get_event_loop()

Called from within async def main(...), which is itself invoked via asyncio.run(main(...)). In Python 3.10+, asyncio.get_event_loop() is deprecated when a running event loop already exists and emits a DeprecationWarning. The correct call inside a coroutine is asyncio.get_running_loop(), which is guaranteed to return the active loop without any deprecation path. This applies to both this PR and the equivalent code already on feature/render-pipeline-optimization.

TEST_COUNT = 9999 as implicit "unlimited" sentinel (line 147)

max_count = TEST_COUNT if TEST_COUNT is not None else 9999

The module-level config already sets TEST_COUNT = 9999, so the else 9999 branch is dead code. Using 9999 as "unlimited" is fragile — a cardset with more than 9999 players would be silently truncated. The correct approach is TEST_COUNT = None as default (representing "no limit"), with float("inf") as fallback.

success accounting gap for two-way players with a bad image2

When a two-way player has "sombaseball" in image2, the function appends to errors and returns early. The primary card was already fetched and uploaded to S3 and recorded in uploads, but the player is never added to successes. This is a silent divergence from the single-card player path.

Security

No issues found. No hardcoded credentials; boto3 uses the standard credential chain. The comment on line 30 describes the S3 client as "thread-safe for concurrent reads" — it is also thread-safe for writes, so the comment is slightly misleading but harmless.

Style & Conventions

  • functools.partial + run_in_executor for the synchronous boto3 S3 PUT is the correct pattern and matches pd_cards/core/upload.py.
  • results_lock / progress_lock split is appropriate and cleaner than a single shared lock.
  • return_exceptions=True on asyncio.gather is correct for a batch pipeline. Be aware that any unhandled exception inside process_single_card (e.g., KeyError on missing player dict field) will be silently absorbed rather than surfacing as a traceback.

Suggestions (non-blocking)

  • Replace asyncio.get_event_loop() with asyncio.get_running_loop() at line 173.
  • Default TEST_COUNT = None at module level and remove the magic 9999 fallback.
  • The process_single_card closure captures session as a free variable before the async with aiohttp.ClientSession() block opens — this works at runtime but a brief comment would help readability.

Verdict: REQUEST_CHANGES (informational — PR already closed)

The implementation is functionally correct and the concurrency pattern is sound. The PR was closed because the same changes were already implemented on feature/render-pipeline-optimization. Two issues should be addressed when that branch is merged to main:

  1. Fix asyncio.get_event_loop()asyncio.get_running_loop() at line 173 (Python 3.10+ deprecation).
  2. Fix TEST_COUNT sentinel — use None as the "unlimited" default instead of 9999.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `check_cards_and_upload.py` (modified) --- ### Critical Context: Duplicate of Existing Work This PR implements changes that are **already present** on the `feature/render-pipeline-optimization` branch (commit `979f308`, "feat: concurrent upload pipeline and benchmarks (Phase 0)"). A byte-for-byte diff of `check_cards_and_upload.py` between this PR's head (`0d603cc`) and `feature/render-pipeline-optimization` produces no output — the files are identical. The manual branch also covers WP-00 (benchmark script `scripts/benchmark_render.sh`) and WP-04 (concurrent `pd_cards/core/upload.py` + `pd_cards/commands/upload.py`), which this PR does not include. The PR was likely closed because the work was already present on `feature/render-pipeline-optimization`. Review notes are posted here for record. --- ### Findings #### Correctness **`asyncio.get_event_loop()` used inside a running coroutine (line 173)** ```python loop = asyncio.get_event_loop() ``` Called from within `async def main(...)`, which is itself invoked via `asyncio.run(main(...))`. In Python 3.10+, `asyncio.get_event_loop()` is deprecated when a running event loop already exists and emits a `DeprecationWarning`. The correct call inside a coroutine is `asyncio.get_running_loop()`, which is guaranteed to return the active loop without any deprecation path. This applies to both this PR and the equivalent code already on `feature/render-pipeline-optimization`. **`TEST_COUNT = 9999` as implicit "unlimited" sentinel (line 147)** ```python max_count = TEST_COUNT if TEST_COUNT is not None else 9999 ``` The module-level config already sets `TEST_COUNT = 9999`, so the `else 9999` branch is dead code. Using `9999` as "unlimited" is fragile — a cardset with more than 9999 players would be silently truncated. The correct approach is `TEST_COUNT = None` as default (representing "no limit"), with `float("inf")` as fallback. **`success` accounting gap for two-way players with a bad `image2`** When a two-way player has `"sombaseball"` in `image2`, the function appends to `errors` and returns early. The primary card was already fetched and uploaded to S3 and recorded in `uploads`, but the player is never added to `successes`. This is a silent divergence from the single-card player path. #### Security No issues found. No hardcoded credentials; boto3 uses the standard credential chain. The comment on line 30 describes the S3 client as "thread-safe for concurrent reads" — it is also thread-safe for writes, so the comment is slightly misleading but harmless. #### Style & Conventions - `functools.partial` + `run_in_executor` for the synchronous boto3 S3 PUT is the correct pattern and matches `pd_cards/core/upload.py`. - `results_lock` / `progress_lock` split is appropriate and cleaner than a single shared lock. - `return_exceptions=True` on `asyncio.gather` is correct for a batch pipeline. Be aware that any unhandled exception inside `process_single_card` (e.g., `KeyError` on missing player dict field) will be silently absorbed rather than surfacing as a traceback. #### Suggestions (non-blocking) - Replace `asyncio.get_event_loop()` with `asyncio.get_running_loop()` at line 173. - Default `TEST_COUNT = None` at module level and remove the magic `9999` fallback. - The `process_single_card` closure captures `session` as a free variable before the `async with aiohttp.ClientSession()` block opens — this works at runtime but a brief comment would help readability. --- ### Verdict: REQUEST_CHANGES (informational — PR already closed) The implementation is functionally correct and the concurrency pattern is sound. The PR was closed because the same changes were already implemented on `feature/render-pipeline-optimization`. Two issues should be addressed when that branch is merged to `main`: 1. **Fix `asyncio.get_event_loop()`** → `asyncio.get_running_loop()` at line 173 (Python 3.10+ deprecation). 2. **Fix `TEST_COUNT` sentinel** — use `None` as the "unlimited" default instead of `9999`. --- *Automated review by Claude PR Reviewer*

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/paper-dynasty-card-creation#26
No description provided.