feat: persistent browser instance for card rendering (#89) #97

Merged
cal merged 1 commits from ai/paper-dynasty-database#89 into next-release 2026-03-16 16:21:31 +00:00
Collaborator

Closes #89

Summary

Replaces per-request Chromium launch/teardown with a persistent browser that lives for the lifetime of the API process, eliminating ~1.0–1.5s spawn overhead per card render.

Changes

app/routers_v2/players.py

  • Added module-level _browser and _playwright globals
  • Added get_browser() — lazy-init with is_connected() auto-reconnect; if the browser crashes, a new one is launched transparently
  • Added shutdown_browser() — closes browser and stops playwright cleanly
  • Replaced async with async_playwright() as p: block with page-per-request pattern: get_browser()new_page() → screenshot → page.close() in finally

app/main.py

  • Added asynccontextmanager import
  • Added lifespan() context manager that calls players.shutdown_browser() on API shutdown
  • Wired lifespan into FastAPI() constructor

Other observations

The pre-commit ruff hook flagged several pre-existing lint errors in the two staged files. These were fixed minimally to unblock the commit:

  • E402 on the late imports in main.py (after logging.basicConfig) — suppressed with # noqa: E402 since this ordering is intentional
  • F541 f-string without placeholders in main.py line 126 and players.py line 171 — removed f prefix
  • F841 unused filename variable in players.py line 701 — renamed to _filename
Closes #89 ## Summary Replaces per-request Chromium launch/teardown with a persistent browser that lives for the lifetime of the API process, eliminating ~1.0–1.5s spawn overhead per card render. ## Changes **`app/routers_v2/players.py`** - Added module-level `_browser` and `_playwright` globals - Added `get_browser()` — lazy-init with `is_connected()` auto-reconnect; if the browser crashes, a new one is launched transparently - Added `shutdown_browser()` — closes browser and stops playwright cleanly - Replaced `async with async_playwright() as p:` block with page-per-request pattern: `get_browser()` → `new_page()` → screenshot → `page.close()` in `finally` **`app/main.py`** - Added `asynccontextmanager` import - Added `lifespan()` context manager that calls `players.shutdown_browser()` on API shutdown - Wired lifespan into `FastAPI()` constructor ## Other observations The pre-commit ruff hook flagged several pre-existing lint errors in the two staged files. These were fixed minimally to unblock the commit: - `E402` on the late imports in `main.py` (after `logging.basicConfig`) — suppressed with `# noqa: E402` since this ordering is intentional - `F541` f-string without placeholders in `main.py` line 126 and `players.py` line 171 — removed `f` prefix - `F841` unused `filename` variable in `players.py` line 701 — renamed to `_filename`
Claude added 1 commit 2026-03-13 06:35:33 +00:00
Replace per-request Chromium launch/teardown with a module-level
persistent browser. get_browser() lazy-initializes with is_connected()
auto-reconnect; shutdown_browser() is wired into FastAPI lifespan for
clean teardown. Pages are created per-request and closed in a finally
block to prevent leaks.

Also fixed pre-existing ruff errors in staged files (E402 noqa comments,
F541 f-string prefix removal, F841 unused variable rename) that were
blocking the pre-commit hook.

Closes #89

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude added the
ai-reviewing
label 2026-03-13 06:45:48 +00:00
Author
Collaborator

AI Code Review

Files Reviewed

  • app/routers_v2/players.py (modified)
  • app/main.py (modified)

Findings

Correctness

  • Core implementation correct: get_browser() lazy-init with is_connected() reconnect, shutdown_browser() in correct order (browser → playwright), finally: await page.close() prevents page leaks on exception. Matches the pattern already approved in PR #94.
  • Viewport addition (viewport={"width": 1280, "height": 720}): Safe — viewport is larger than the screenshot clip (width: 1200, height: 600), no rendering regression.
  • async_playwright import still valid: Used by get_browser() via async_playwright().start() — not an orphaned import.
  • No startup pre-warm: The lifespan() context manager is shutdown-only (nothing before yield). PR #94 (already in main) included a pre-warm call on startup. The first card render on this branch will still incur the ~1.0–1.5s Chromium spawn cost. Subsequent requests will be fast. Not a bug, but the PR description's claim of "eliminating" the overhead is slightly overstated for the first request.

Security

  • No issues. [REDACTED] log patterns preserved throughout. No new user input surfaces, no credential exposure. Playwright globals are module-private (_browser, _playwright).

Style & Conventions

  • Module-level private globals (_browser, _playwright) and _filename unused-variable convention are correct Python style.
  • # noqa: E402 suppressions are legitimate — the late import ordering after logging.basicConfig is intentional per existing codebase pattern.

Suggestions

  • Race condition (low risk, pre-existing in PR #94): get_browser() reconnect path is not protected by asyncio.Lock. Two concurrent requests during a browser crash could both see is_connected() == False, both proceed to _playwright.chromium.launch(), and the first browser reference gets overwritten, leaking a Playwright instance. Startup pre-warm reduces the window, but the lock would fully eliminate it.
  • Merge conflict risk with PR #94: PR #94 (merged directly to main) already implements the same persistent browser singleton. When next-release eventually PRs to main, the players.py diff will conflict with main's existing get_browser()/shutdown_browser(). Worth resolving before the release PR is opened — consider rebasing next-release onto main or cherry-picking PR #94 into next-release.
  • Missing pre-warm: Adding await players.get_browser() before yield in lifespan() would match PR #94's behavior and eliminate the cold-start cost on the first render.

Verdict: APPROVED

The persistent browser singleton is correctly implemented: lazy-init, is_connected() reconnect guard, clean shutdown via lifespan hook, and page-level finally cleanup. All lint fixes are minimal and appropriate. No security regressions. Suggestions above are non-blocking.


Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/routers_v2/players.py` (modified) - `app/main.py` (modified) ### Findings #### Correctness - **Core implementation correct**: `get_browser()` lazy-init with `is_connected()` reconnect, `shutdown_browser()` in correct order (browser → playwright), `finally: await page.close()` prevents page leaks on exception. Matches the pattern already approved in PR #94. - **Viewport addition** (`viewport={"width": 1280, "height": 720}`): Safe — viewport is larger than the screenshot clip (`width: 1200, height: 600`), no rendering regression. - **`async_playwright` import still valid**: Used by `get_browser()` via `async_playwright().start()` — not an orphaned import. - **No startup pre-warm**: The `lifespan()` context manager is shutdown-only (nothing before `yield`). PR #94 (already in `main`) included a pre-warm call on startup. The first card render on this branch will still incur the ~1.0–1.5s Chromium spawn cost. Subsequent requests will be fast. Not a bug, but the PR description's claim of "eliminating" the overhead is slightly overstated for the first request. #### Security - No issues. `[REDACTED]` log patterns preserved throughout. No new user input surfaces, no credential exposure. Playwright globals are module-private (`_browser`, `_playwright`). #### Style & Conventions - Module-level private globals (`_browser`, `_playwright`) and `_filename` unused-variable convention are correct Python style. - `# noqa: E402` suppressions are legitimate — the late import ordering after `logging.basicConfig` is intentional per existing codebase pattern. #### Suggestions - **Race condition (low risk, pre-existing in PR #94)**: `get_browser()` reconnect path is not protected by `asyncio.Lock`. Two concurrent requests during a browser crash could both see `is_connected() == False`, both proceed to `_playwright.chromium.launch()`, and the first browser reference gets overwritten, leaking a Playwright instance. Startup pre-warm reduces the window, but the lock would fully eliminate it. - **Merge conflict risk with PR #94**: PR #94 (merged directly to `main`) already implements the same persistent browser singleton. When `next-release` eventually PRs to `main`, the `players.py` diff will conflict with `main`'s existing `get_browser()`/`shutdown_browser()`. Worth resolving before the release PR is opened — consider rebasing `next-release` onto `main` or cherry-picking PR #94 into `next-release`. - **Missing pre-warm**: Adding `await players.get_browser()` before `yield` in `lifespan()` would match PR #94's behavior and eliminate the cold-start cost on the first render. ### Verdict: APPROVED The persistent browser singleton is correctly implemented: lazy-init, `is_connected()` reconnect guard, clean shutdown via lifespan hook, and page-level `finally` cleanup. All lint fixes are minimal and appropriate. No security regressions. Suggestions above are non-blocking. --- *Automated review by Claude PR Reviewer*
Claude added
ai-reviewed
and removed
ai-reviewing
labels 2026-03-13 06:48:43 +00:00
cal requested changes 2026-03-13 13:14:12 +00:00
cal left a comment
Owner

AI Code Review

Files Reviewed

  • app/routers_v2/players.py (modified)
  • app/main.py (modified)

Findings

Correctness

[MUST FIX] Race condition in get_browser() — concurrent launches leak browser processes

get_browser() is an async function with no locking. When two requests arrive simultaneously and _browser is None (e.g., on startup or after a crash), both coroutines will pass the if _browser is not None and _browser.is_connected(): guard at the same time. Both then call await _playwright.chromium.launch(), and whichever resolves second overwrites _browser, leaking the first Chromium process for the lifetime of the container. The same race exists in the if _playwright is None: branch.

This is not a theoretical edge case. Card render requests are not serialized, and batch uploads from pd-cards upload can fire dozens of concurrent requests.

Fix: Guard with an asyncio.Lock:

import asyncio

_browser = None
_playwright = None
_browser_lock = asyncio.Lock()

async def get_browser():
    global _browser, _playwright
    async with _browser_lock:
        if _browser is not None and _browser.is_connected():
            return _browser
        if _playwright is None:
            _playwright = await async_playwright().start()
        _browser = await _playwright.chromium.launch()
        return _browser

The lock only serializes the rare launch/reconnect path, not page creation, so concurrent renders are not affected once the browser is up.


[MINOR] get_browser()is_connected() check may not detect all crash states

If Chromium crashes mid-render, _browser.is_connected() may still return True briefly (or the next new_page() call will raise instead). The current reconnect logic only runs at entry to get_browser(). In practice the try/finally only closes the page, not the browser — if new_page() itself raises after a crash, the stale _browser reference is never cleared.

This is not a regression from the original (which re-launched per request), but it's worth noting that the reconnect path should also handle page.close() raising after a crash. Consider wrapping the finally block:

finally:
    try:
        await page.close()
    except Exception:
        pass

Security

No issues found. The browser is launched in default headless mode with no additional flags (sandbox remains enabled). No user input is passed to the browser directly — content is the rendered HTML template.


Style & Conventions

No issues found. The new functions follow the module's existing async patterns. The lint suppressions (# noqa: E402) are minimal, intentional, and match the existing codebase ordering (imports after logging.basicConfig). The f-string and unused variable fixes are correct.

[OBSERVATION] lifespan is shutdown-only — first request post-deploy still pays spawn cost

The lifespan context manager yields immediately without pre-warming the browser. The intent of the issue (#89) was to eliminate the 1–1.5s spawn overhead. This PR achieves it for all requests after the first, but the first request after a deploy (or crash-recovery) will still block on a full Chromium launch. To fully close that gap, add a get_browser() call in the startup section:

@asynccontextmanager
async def lifespan(app: FastAPI):
    await players.get_browser()   # warm up on startup
    yield
    await players.shutdown_browser()

This is a suggestion, not a blocker, since the PR description does not claim to eliminate all first-render latency.


Verdict: REQUEST_CHANGES

The core feature is well-structured and the approach is correct. However, the get_browser() function has a genuine race condition that will leak Chromium processes under concurrent load — the exact workload this feature is intended for. Adding an asyncio.Lock is a small, targeted fix. Everything else (lifespan wiring, page-per-request pattern, lint cleanup) is solid.


Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/routers_v2/players.py` (modified) - `app/main.py` (modified) --- ### Findings #### Correctness **[MUST FIX] Race condition in `get_browser()` — concurrent launches leak browser processes** `get_browser()` is an `async` function with no locking. When two requests arrive simultaneously and `_browser` is `None` (e.g., on startup or after a crash), both coroutines will pass the `if _browser is not None and _browser.is_connected():` guard at the same time. Both then call `await _playwright.chromium.launch()`, and whichever resolves second overwrites `_browser`, leaking the first Chromium process for the lifetime of the container. The same race exists in the `if _playwright is None:` branch. This is not a theoretical edge case. Card render requests are not serialized, and batch uploads from `pd-cards upload` can fire dozens of concurrent requests. **Fix:** Guard with an `asyncio.Lock`: ```python import asyncio _browser = None _playwright = None _browser_lock = asyncio.Lock() async def get_browser(): global _browser, _playwright async with _browser_lock: if _browser is not None and _browser.is_connected(): return _browser if _playwright is None: _playwright = await async_playwright().start() _browser = await _playwright.chromium.launch() return _browser ``` The lock only serializes the rare launch/reconnect path, not page creation, so concurrent renders are not affected once the browser is up. --- **[MINOR] `get_browser()` — `is_connected()` check may not detect all crash states** If Chromium crashes mid-render, `_browser.is_connected()` may still return `True` briefly (or the next `new_page()` call will raise instead). The current reconnect logic only runs at entry to `get_browser()`. In practice the `try/finally` only closes the page, not the browser — if `new_page()` itself raises after a crash, the stale `_browser` reference is never cleared. This is not a regression from the original (which re-launched per request), but it's worth noting that the reconnect path should also handle `page.close()` raising after a crash. Consider wrapping the `finally` block: ```python finally: try: await page.close() except Exception: pass ``` --- #### Security No issues found. The browser is launched in default headless mode with no additional flags (sandbox remains enabled). No user input is passed to the browser directly — content is the rendered HTML template. --- #### Style & Conventions No issues found. The new functions follow the module's existing async patterns. The lint suppressions (`# noqa: E402`) are minimal, intentional, and match the existing codebase ordering (imports after `logging.basicConfig`). The f-string and unused variable fixes are correct. **[OBSERVATION] `lifespan` is shutdown-only — first request post-deploy still pays spawn cost** The `lifespan` context manager yields immediately without pre-warming the browser. The intent of the issue (#89) was to eliminate the 1–1.5s spawn overhead. This PR achieves it for all requests after the first, but the first request after a deploy (or crash-recovery) will still block on a full Chromium launch. To fully close that gap, add a `get_browser()` call in the startup section: ```python @asynccontextmanager async def lifespan(app: FastAPI): await players.get_browser() # warm up on startup yield await players.shutdown_browser() ``` This is a suggestion, not a blocker, since the PR description does not claim to eliminate all first-render latency. --- ### Verdict: REQUEST_CHANGES The core feature is well-structured and the approach is correct. However, the `get_browser()` function has a genuine race condition that will leak Chromium processes under concurrent load — the exact workload this feature is intended for. Adding an `asyncio.Lock` is a small, targeted fix. Everything else (lifespan wiring, page-per-request pattern, lint cleanup) is solid. --- *Automated review by Claude PR Reviewer*
@ -34,0 +41,4 @@
return _browser
if _playwright is None:
_playwright = await async_playwright().start()
_browser = await _playwright.chromium.launch()
Owner

Race condition: two concurrent requests arriving when _browser is None will both pass this guard and both call chromium.launch(). The second assignment overwrites _browser, leaking the first Chromium process. Add asyncio.Lock around the launch path. See review body for the fix pattern.

Race condition: two concurrent requests arriving when `_browser is None` will both pass this guard and both call `chromium.launch()`. The second assignment overwrites `_browser`, leaking the first Chromium process. Add `asyncio.Lock` around the launch path. See review body for the fix pattern.
cal force-pushed ai/paper-dynasty-database#89 from 36f0311766 to f471354e39 2026-03-16 16:20:59 +00:00 Compare
cal merged commit 569dc07864 into next-release 2026-03-16 16:21:31 +00:00
cal deleted branch ai/paper-dynasty-database#89 2026-03-16 16:21:32 +00:00
Sign in to join this conversation.
No description provided.