feat: concurrent upload pipeline (WP-04) (#91) #25
No reviewers
Labels
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/paper-dynasty-card-creation#25
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "ai/paper-dynasty-database#91"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes paper-dynasty-database#91
Summary
Replaces the sequential
for x in all_players:upload loop inpd_cards/core/upload.pywith a semaphore-boundedasyncio.gatherpipeline, satisfying all WP-04 acceptance criteria.Changes
pd_cards/core/upload.pyprocess_single_card(player)coroutine wrapping all per-card work (fetch + S3 upload + URL patch)asyncio.Semaphore(concurrency)limits concurrent tasks (default 8)asyncio.gather(*tasks, return_exceptions=True)replaces the sequential looploop.run_in_executor(None, upload_card_to_s3, ...)offloads synchronous boto3 calls to thread poolfetch_card_imagetimeout increased from 6s → 10slogger.infoevery 20 completionsconcurrencyparameter added toupload_cards_to_s3()pd_cards/commands/upload.py--concurrency / -jCLI option added (default: 8)upload_cards_to_s3()check_cards_and_upload.py(WP-05)scripts/benchmark_render.sh+scripts/benchmark_results.txt(WP-00)Acceptance Criteria Verified
Semaphore(concurrency)default=8)return_exceptions=True+ per-card try/except)fetch_card_imagetimeout is 10s--concurrencyflag available on CLIOther observations
refresh_card_images()still uses sequential loops — it was out of scope for WP-04 but could be a follow-upasyncio.Lockobjects (progress_lock,results_lock) guard shared mutable state; these could be replaced with atomic counters if overhead becomes a concern at high concurrencySuperseded by #27 which includes WP-00, WP-04, and WP-05 in a single commit with the same implementation. Closing as duplicate.
AI Code Review
Files Reviewed
pd_cards/core/upload.py(modified — concurrent pipeline WP-04)pd_cards/commands/upload.py(modified —--concurrency / -jCLI flag)check_cards_and_upload.py(modified — WP-05 legacy script)scripts/benchmark_render.sh(added — WP-00)scripts/benchmark_results.txt(added — WP-00 baseline)IMPORTANT: Duplicate Work Notice
This PR's head commit (
979f308) is identical to the tip of thefeature/render-pipeline-optimizationbranch. Both branches contain the same single commit on top ofmain(336014b). The PR title says "auto-generated by the issue worker" but the commit author iscalwith timestamp2026-03-12 23:53:56. This is not duplicate AI work that diverged from a manual implementation — it IS the manual implementation, pushed onto theai/paper-dynasty-database#91branch. Merging this PR would apply the same changes as mergingfeature/render-pipeline-optimization. No code conflicts to resolve, but you only need to merge one of the two — or close this PR and merge viafeature/render-pipeline-optimizationdirectly.Findings
Correctness
asyncio.get_event_loop()inside a coroutine — bothpd_cards/core/upload.py(line 211) andcheck_cards_and_upload.py(line 173) callasyncio.get_event_loop()from inside anasyncfunction. When called from a running coroutine this works correctly (it returns the running loop), but the idiomatic and forward-compatible form isasyncio.get_running_loop(), which was added in Python 3.7 precisely for this use-case.get_event_loop()is deprecated-in-context on Python 3.10+ when there is no current loop (though that is not the situation here). This is a minor style/future-proofing issue, not a runtime bug.functools.partialasymmetry —check_cards_and_upload.pyimportsfunctoolsand usesfunctools.partialto pass arguments torun_in_executorbecauseupload_card_to_s3in that file uses module-level globals for the client and bucket.pd_cards/core/upload.pypasses positional arguments directly torun_in_executor(nofunctools.partialneeded sinceupload_card_to_s3there accepts all arguments explicitly). This is correct and intentional given the design difference between the two files.sessioncaptured via closure —process_single_cardis defined as an inner function inside theasync with aiohttp.ClientSession() as session:block in both files, and tasks are spawned within that same block. Thesessionis properly in scope for all concurrent tasks throughoutasyncio.gather. No use-after-close risk.asyncio.gather(..., return_exceptions=True)with inner try/except — return values fromprocess_single_cardareNone(or exceptions swallowed byreturn_exceptions=True). Actual errors are routed into theerrorslist inside the coroutine. The outerreturn_exceptions=Trueis a safety net against any exception that slips past the inner handlers. This is correct.Progress counter in
report_progress—completedis mutated underprogress_lockand thenonlocaldeclaration is correct. No race condition.image2secondary-card success tracking incore/upload.py— when a two-way player hasimage2, thesuccesses.append(x)call is placed only in theelsebranch of theimage2block. If image2 processing raises an exception,xis added toerrorsbut notsuccesses— which is the correct behavior. The primary-only path for non-two-way players also correctly appends tosuccessesin theelse: successes.append(x)at the bottom.No correctness bugs found.
Security
No new security surface introduced. S3 credentials flow through boto3's standard credential chain (environment /
~/.aws/credentials), consistent with the existing pattern. No hardcoded secrets. The semaphore prevents accidental unbounded parallelism that could exhaust file descriptors or overwhelm the API.Style & Conventions
benchmark_results.txtcommitted to repository — the results file is a point-in-time snapshot from a single dev-API run (2026-03-12). Committing benchmark output alongside the benchmark script is reasonable for a Phase 0 baseline, but consider whether this file should be.gitignored for future runs so stale results do not accumulate. Currently not a blocker.check_cards_and_upload.py:print()vslogger— the legacy script mixesprint()for progress andloggerfor structured output. The new code adds moreprint()calls for the summary header. This is consistent with the existing style of that file (it predates the logger module) and is not a regression.Two-lock design (
progress_lock,results_lock) — the PR description acknowledges these could be replaced with atomic counters. Atconcurrency=8the lock contention is negligible. No action needed.Suggestions
asyncio.get_event_loop()withasyncio.get_running_loop()in bothpd_cards/core/upload.py(line 211) andcheck_cards_and_upload.py(line 173).get_running_loop()raisesRuntimeErrorif called outside a running loop (a helpful safeguard) whereasget_event_loop()would silently create a new loop in that scenario.scripts/benchmark_results.txtto.gitignoreor renaming it tobenchmark_results_baseline.txtto signal it is a one-time reference rather than an accumulating log.Verdict: COMMENT
The implementation is correct and satisfies all WP-04 acceptance criteria. The concurrency model (semaphore-bounded
asyncio.gather, boto3 offloaded to thread pool, individual failure isolation) is sound. The two suggestions above are minor and do not block merging.Primary action required: determine which branch to merge. This PR (
ai/paper-dynasty-database#91) andfeature/render-pipeline-optimizationboth point to the same commit. Close one and merge the other, or simply merge this PR and delete the duplicate branch.Automated review by Claude PR Reviewer
Pull request closed