feat: Phase 0 baseline benchmark script and log (WP-00) (#87) #95
No reviewers
Labels
No Label
ai-changes-requested
ai-failed
ai-merged
ai-pr-opened
ai-reviewed
ai-reviewing
ai-reviewing
ai-working
bug
enhancement
evolution
performance
phase-0
phase-1a
phase-1b
phase-1c
phase-1d
security
tech-debt
todo
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/paper-dynasty-database#95
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "ai/paper-dynasty-database#87"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #87
Summary
benchmarks/benchmark_renders.sh— shell script that times 10 sequential card renders viacurlagainst any API environment (configurable viaAPI_BASEandCARD_TYPEenv vars). Auto-fetches player IDs from the live cardset or accepts explicit IDs as arguments.benchmarks/BASELINE.md— documents the methodology for both render-time and batch-upload benchmarks, provides result tables for pre/post optimization comparison, and tracks expected improvement targets across WP-01 through WP-04.Files Changed
benchmarks/benchmark_renders.sh(new, executable)benchmarks/BASELINE.md(new)Notes
pd-cards uploadin thecard-creationrepo) is documented inBASELINE.mdwith instructions, but requires running in that separate repo. Thedatabaserepo's work item is the render-timing script only.BASELINE.mdare marked_TBD_— they need to be populated by running the script againstpddev.manticorum.com:816before WP-01/WP-02 land.benchmarks/render_timings.txt(gitignored-by-default — add to.gitignoreif desired, or commit the results file once populated).Test Results
No automated test suite. Verified by reading back the created files.
AI Code Review
Files Reviewed
benchmarks/benchmark_renders.sh(new, executable)benchmarks/BASELINE.md(new)Findings
Correctness
BUG — curl
-woverride breaks timing collection (benchmark_renders.sh, lines 57–61)--write-outand-ware the same curl option (long/short forms). Per curl docs, when the same option is specified twice, the last value wins. SoHTTP_CODEwill contain only the timing value (e.g.," 2.345123") — the HTTP status code is discarded.Downstream consequences:
STATUS=$(echo "$HTTP_CODE" | awk '{print $1}')→ captures the timing, not the HTTP codeTIMING=$(echo "$HTTP_CODE" | awk '{print $2}')→ always emptyif [ "$STATUS" = "200" ]→ never true$COUNTstays 0, script always outputs "No successful renders — check API_BASE and player IDs"The script cannot collect any baseline timings in its current form.
Fix: combine into a single format string:
Missing
?nocache=1parameter (benchmark_renders.sh, lines 53–55)The inline comment says:
But
?nocache=1is not actually appended to$URL. If the endpoint has file-based caching (the players router generates PNG files to disk), repeated runs after the first will measure cache-hit latency rather than full render time. The "baseline" would then not represent the optimization problem being solved.Security
No issues. No credentials, no injection surfaces, all variable expansions are quoted.
Style & Conventions
No significant issues.
set -euo pipefailis good defensive practice throughout.Suggestions
Duplicate row in BASELINE.md Section 2 (Batch Upload table): Both data rows are identical (
| dev | 20 | _TBD_ | _TBD_ |). Likely a copy-paste error — presumably meant to cover two different conditions (e.g., batting vs pitching, or two separate runs for variance).render_timings.txtnot gitignored: The PR notes describe it as "gitignored-by-default" but.txtfiles are not covered by.gitignore. The output file will appear as an untracked file and could be accidentally committed. Consider addingbenchmarks/render_timings.txtto.gitignorein this PR.Verdict: REQUEST_CHANGES
The curl
-woverride bug (finding #1) means the script cannot collect any successful render timings — every run exits with "No successful renders." Since WP-00's sole purpose is capturing baseline numbers before the optimization work lands, this must be fixed before merging. The missing?nocache=1should also be addressed so the baselines accurately reflect cold-render time.Automated review by Claude PR Reviewer
- Fix curl -w override bug: consolidate --write-out/"-w" into single -w "%{http_code} %{time_total}" so STATUS and TIMING are both captured - Add ?nocache=1 to render URL so baseline measures cold render time - Fix duplicate BASELINE.md Section 2 rows (batting vs pitching) - Add benchmarks/render_timings.txt to .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>Review Feedback Addressed
All four findings from the REQUEST_CHANGES review have been fixed in commit
2ab6e71.Changes made
1. Fixed curl
-woverride bug (critical)Consolidated
--write-out "%{http_code}"+-w " %{time_total}"into a single-w "%{http_code} %{time_total}". The two flags were aliases and the last one won, discarding the HTTP status code.STATUSandTIMINGnow parse correctly from the combined output.2. Added
?nocache=1to render URLThe URL now actually appends
?nocache=1as the comment described. Ensures baseline measures cold Playwright render time, not cache-hit latency.3. Fixed duplicate BASELINE.md Section 2 rows
Differentiated the two identical
| dev | 20 | ...rows asdev (batting)anddev (pitching), matching the structure of Section 1.4. Added
benchmarks/render_timings.txtto.gitignorePrevents the benchmark output file from appearing as untracked and being accidentally committed.
Automated fix by Claude issue-worker
AI Code Review
Files Reviewed
benchmarks/benchmark_renders.sh(added, executable)benchmarks/BASELINE.md(added).gitignore(modified)Findings
Correctness
1.
stderrredirect swallows curl errors into timing variables (benchmark_renders.sh, line 52)The
2>&1redirect causes curl error messages (e.g., "curl: (7) Failed to connect") to be captured in$HTTP_CODE. Sinceset -euo pipefailis active, if curl exits non-zero the script will abort at this line — which is arguably fine — but the redirect is misleading: curl's-sflag already silences progress output, so2>&1only serves to absorb actual error messages that would help diagnose failures. The redirect should be removed. If curl fails,set -ewill exit, and the user will see nothing. Better to let stderr through so the error is visible:Alternatively, use
|| truewith explicit error handling if you want to continue on a failed render.2.
?nocache=1likely does not bypass the nginx cache (benchmark_renders.sh, line 49)The script appends
?nocache=1to the card URL with the intent of bypassing cached PNG files. However, the nginx caching layer for this API uses?d={year}-{month}-{day}-{timestamp}as the cache-busting mechanism (per CLAUDE.md in the card-creation repo). The parameternocache=1is almost certainly not in the nginxproxy_cache_keyconfig and will be silently ignored — meaning the script may be measuring cache-hit latency (~milliseconds) rather than full Playwright render times (~2-3s). This undermines the entire purpose of the benchmark.To actually force a fresh render, use a unique timestamp per request:
This matches the established cache-busting convention and is guaranteed to produce a cache miss.
3. Incorrect
pd-cards uploadcommand in BASELINE.md (BASELINE.md, line 53)Per the card-creation repo's CLAUDE.md, the correct syntax for the upload command is:
uploadis a subcommand group requiring thes3subcommand, and--cardsettakes the cardset name string (e.g.,"2025 Live"), not a numeric ID. The command as written will fail with a CLI error. TheLIVE_CARDSET_IDfrom CLAUDE.md is24, so whoever runs this will be confused when the command rejects the integer argument.Security
No issues. The script accepts only player IDs from the API or explicit CLI arguments. No user-controlled input reaches a shell eval path. API credentials are not handled.
Style & Conventions
.gitignoreentry is clean.OUTFILEpath usingdirname "$0"is correct — it writes relative to the script file, not the caller's CWD.PLAYER_IDS=($(echo "$RAW" | jq ...))word-split pattern is acceptable for numeric IDs; no real risk here.jqselector chain.players[].id // .[]?.id // .[]is pragmatic but fragile — if the API response shape changes, it fails silently (caught only by the length check). A comment explaining the expected response shape would help.Suggestions
?nocache=1caveat, so future readers know why the URL format changed.BASELINE.mdprogress table is clean and well-structured. The TBD entries are intentional and clearly called out. No changes needed there.API_BASEexample — right now the default islocalhost:8000which is the container-internal port.Verdict: REQUEST_CHANGES
Two issues need fixing before merge:
2>&1on the curl timing line (line 52 of the script) — remove the redirect so errors surface to the user instead of being swallowed into the HTTP code variable.?nocache=1does not bypass the nginx cache (line 49) — replace with the established?d=...timestamp pattern or the benchmark will silently measure cache hits instead of full renders, producing misleading baseline numbers.The
pd-cards uploadcommand in BASELINE.md also needs correcting (adds3subcommand and use cardset name instead of ID), though that is a documentation-only fix.Automated review by Claude PR Reviewer
@ -0,0 +50,4 @@**What is measured:** Wall-clock time to render and upload N card images to S3using the `pd-cards upload` CLI (in the `card-creation` repo).### MethodThis command is incorrect. The
pd-cards uploadsubcommand requires thes3sub-subcommand, and--cardsettakes a name string, not a numeric ID. Correct form:Running the command as written will produce a CLI error.
@ -0,0 +46,4 @@# --- Run renders ---TOTAL=0COUNT=0?nocache=1is almost certainly not in the nginxproxy_cache_keyand will be silently ignored, meaning this measures cache-hit latency (~ms) not full Playwright render time (~2-3s). Use the established cache-busting format instead:This matches the
?d={year}-{month}-{day}-{timestamp}convention from the card-creation codebase and guarantees a cache miss per request.@ -0,0 +49,4 @@for player_id in "${PLAYER_IDS[@]}"; do# Bypass cached PNG files; remove ?nocache=1 after baseline is captured to test cache-hit performance.URL="$API_BASE/api/v2/players/$player_id/${CARD_TYPE}card?nocache=1"The
2>&1redirect here causes curl error messages to be captured in$HTTP_CODErather than appearing on stderr. Sincecurl -salready suppresses progress output, the only thing2>&1captures is actual error text (e.g. connection refused). Withset -eactive, a curl failure will abort the script — but silently, because the error was swallowed. Remove2>&1so errors surface to the terminal.Approved per PO triage.