feat: WP-08 evaluate endpoint and evolution evaluator service (#73) #98

Claude · 2026-03-13T08:09:10Z

Claude commented

2026-03-13 08:09:10 +00:00

Closes #73

Summary

app/services/evolution_evaluator.py: evaluate_card(player_id, team_id) aggregates career player_season_stats totals, delegates to formula engine for value/tier computation, updates evolution_card_state with no-regression guarantee (current_tier = max(current_tier, new_tier))
app/routers_v2/evolution.py: POST /api/v2/evolution/cards/{card_id}/evaluate endpoint (resolves card_id → player_id/team_id via Card lookup) plus the WP-06 GET /tracks endpoints
tests/test_evolution_evaluator.py: 15 unit tests covering all issue requirements — tier assignment, advancement, partial progress, idempotency, fully evolved, no regression, multi-season aggregation, missing state error, return shape

Evaluation logic implemented

Load career totals: SUM all player_season_stats rows for (player_id, team_id) across all seasons
Determine track from card_state.track
Compute formula value via formula engine (compute_value_for_track)
Determine new_tier via tier_from_value
card_state.current_value = computed value
card_state.current_tier = max(current_tier, new_tier) — no regression
card_state.fully_evolved = (new_tier >= 4)
card_state.last_evaluated_at = NOW()

Test results

15 passed in 0.06s

All tests use stub Peewee models bound to in-memory SQLite and inject stub formula functions — no dependency on WP-01/WP-05/WP-07/WP-09 being merged.

Other observations

datetime.utcnow() is deprecated in Python 3.12+; pre-existing pattern in the codebase — not fixed here
evolution.py router includes the GET /tracks endpoints from WP-06 (they don't exist in next-release yet since that PR hasn't merged); if WP-06 merges first this will conflict on the router file

Closes #73 ## Summary - **`app/services/evolution_evaluator.py`**: `evaluate_card(player_id, team_id)` aggregates career `player_season_stats` totals, delegates to formula engine for value/tier computation, updates `evolution_card_state` with no-regression guarantee (`current_tier = max(current_tier, new_tier)`) - **`app/routers_v2/evolution.py`**: `POST /api/v2/evolution/cards/{card_id}/evaluate` endpoint (resolves card_id → player_id/team_id via Card lookup) plus the WP-06 GET /tracks endpoints - **`tests/test_evolution_evaluator.py`**: 15 unit tests covering all issue requirements — tier assignment, advancement, partial progress, idempotency, fully evolved, no regression, multi-season aggregation, missing state error, return shape ## Evaluation logic implemented 1. Load career totals: SUM all `player_season_stats` rows for `(player_id, team_id)` across all seasons 2. Determine track from `card_state.track` 3. Compute formula value via formula engine (`compute_value_for_track`) 4. Determine `new_tier` via `tier_from_value` 5. `card_state.current_value = computed value` 6. `card_state.current_tier = max(current_tier, new_tier)` — no regression 7. `card_state.fully_evolved = (new_tier >= 4)` 8. `card_state.last_evaluated_at = NOW()` ## Test results ``` 15 passed in 0.06s ``` All tests use stub Peewee models bound to in-memory SQLite and inject stub formula functions — no dependency on WP-01/WP-05/WP-07/WP-09 being merged. ## Other observations - `datetime.utcnow()` is deprecated in Python 3.12+; pre-existing pattern in the codebase — not fixed here - `evolution.py` router includes the GET /tracks endpoints from WP-06 (they don't exist in `next-release` yet since that PR hasn't merged); if WP-06 merges first this will conflict on the router file

Claude added 1 commit 2026-03-13 08:09:11 +00:00

feat: WP-08 evaluate endpoint and evolution evaluator service (#73 ) 0f969da206

Closes #73

Adds POST /api/v2/evolution/cards/{card_id}/evaluate — force-recalculates
a card's evolution state from career totals (SUM across all
player_season_stats rows for the player-team pair).

Changes:
- app/services/evolution_evaluator.py: evaluate_card() function that
  aggregates career stats, delegates to formula engine for value/tier
  computation, updates evolution_card_state with no-regression guarantee
- app/routers_v2/evolution.py: POST /cards/{card_id}/evaluate endpoint
  plus existing GET /tracks and GET /tracks/{id} endpoints (WP-06)
- tests/test_evolution_evaluator.py: 15 unit tests covering tier
  assignment, advancement, partial progress, idempotency, fully evolved,
  no regression, multi-season aggregation, missing state error, and
  return shape
- tests/__init__.py, tests/conftest.py: shared test infrastructure

All 15 tests pass. Models and formula engine are lazily imported so
this module is safely importable before WP-01/WP-05/WP-07/WP-09 merge.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Claude referenced this pull request

2026-03-13 08:09:25 +00:00

WP-08: Evaluate Endpoint #73

Claude added the

ai-reviewing

label 2026-03-13 08:15:48 +00:00

cal requested changes 2026-03-13 13:17:06 +00:00

Dismissed

cal left a comment

AI Code Review

Files Reviewed

app/routers_v2/evolution.py (added)
app/services/evolution_evaluator.py (added)
tests/conftest.py (added)
tests/__init__.py (added)
tests/test_evolution_evaluator.py (added)

Findings

Correctness

Bug: fully_evolved can be wrongly cleared under no-regression guarantee

evolution_evaluator.py line 138:

card_state.current_tier = max(card_state.current_tier, new_tier)
card_state.fully_evolved = new_tier >= 4   # BUG: should use card_state.current_tier

current_tier is correctly protected by max(), but fully_evolved is derived from the unprotected new_tier. If a player at T4 is re-evaluated after a data correction causes new_tier to compute as 3, current_tier stays at 4 (correct) but fully_evolved gets set to False (incorrect). These two fields are now inconsistent.

The fix is one character: card_state.fully_evolved = card_state.current_tier >= 4. The test suite does not cover this case (the no-regression tests only verify current_tier stays the same; none of them test fully_evolved on a card that was previously at T4 when new_tier drops below 4).

Router not registered in main.py

The PR adds app/routers_v2/evolution.py and app/services/evolution_evaluator.py but does not modify main.py to import and register the new router. The /api/v2/evolution/cards/{card_id}/evaluate endpoint will be unreachable after merging. The GET /tracks endpoints also won't be served until the router is included.

The WP-06 PR (#71, commit ddf6ff5) added main.py changes when it introduced evolution.py; this PR needs to do the same since it's the first to create that file on next-release.

Security

Broad except Exception in get_track handler

evolution.py line 44:

except Exception:
    raise HTTPException(status_code=404, detail=f"Track {track_id} not found")

This catches every possible exception and surfaces it as a 404. A database connectivity failure, an ORM misconfiguration, or any unexpected error will silently become a misleading 404. The codebase already had a merged fix specifically for this pattern (PR #48 / commit 0c04216: "fix: replace broad except Exception blocks with DoesNotExist"). This PR reintroduces the same anti-pattern.

The correct fix (consistent with the existing get_card_state handler in the in-progress WP-07 work):

from ..db_engine import DoesNotExist, EvolutionTrack

try:
    track = EvolutionTrack.get_by_id(track_id)
except DoesNotExist:
    raise HTTPException(status_code=404, detail=f"Track {track_id} not found")

The evaluate_card handler on line 59 has the same issue (except Exception catching Card lookup failures).

Style & Conventions

datetime.utcnow() deprecation

evolution_evaluator.py line 134. The PR author notes this in the PR description and correctly identifies it as a pre-existing codebase pattern — no action needed here.

Lazy imports in the evaluator service are justified

The dependency-injection approach for _stats_model, _state_model, _compute_value_fn, _tier_from_value_fn is well-suited for the parallel-branch development context and produces clean, testable code. The noqa: PLC0415 suppressions on the conditional lazy imports are appropriate.

conftest.py approach is correct

Setting DATABASE_TYPE=postgresql before imports to prevent test collection from touching the SQLite file is the right pattern for this codebase.

Test Coverage

The 15 tests are well-structured and cover the specified requirements (tier assignment, advancement, partial progress, idempotency, multi-season aggregation, missing state, return shape). The use of stub models bound to an in-memory SQLite database is clean and doesn't require any merged dependencies.

One gap: there is no test for the fully_evolved inconsistency described above — a test where current_tier is already 4 and new_tier computes below 4 would expose the bug.

Verdict: REQUEST_CHANGES

Two actionable issues must be fixed before merging:

fully_evolved bug (evolution_evaluator.py line 138): change new_tier >= 4 to card_state.current_tier >= 4 to keep fully_evolved consistent with the no-regression guarantee.
Router not registered (main.py): import and include evolution.router in the application, otherwise all endpoints in this PR are unreachable.

The broad except Exception pattern in the router (lines 44 and 59) should also be narrowed to DoesNotExist, consistent with the codebase's established fix for this issue (PR #48).

The evaluator service logic, injection-based testing architecture, and test suite are otherwise solid.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/routers_v2/evolution.py` (added) - `app/services/evolution_evaluator.py` (added) - `tests/conftest.py` (added) - `tests/__init__.py` (added) - `tests/test_evolution_evaluator.py` (added) --- ### Findings #### Correctness **Bug: `fully_evolved` can be wrongly cleared under no-regression guarantee** `evolution_evaluator.py` line 138: ```python card_state.current_tier = max(card_state.current_tier, new_tier) card_state.fully_evolved = new_tier >= 4 # BUG: should use card_state.current_tier ``` `current_tier` is correctly protected by `max()`, but `fully_evolved` is derived from the unprotected `new_tier`. If a player at T4 is re-evaluated after a data correction causes `new_tier` to compute as 3, `current_tier` stays at 4 (correct) but `fully_evolved` gets set to `False` (incorrect). These two fields are now inconsistent. The fix is one character: `card_state.fully_evolved = card_state.current_tier >= 4`. The test suite does not cover this case (the no-regression tests only verify `current_tier` stays the same; none of them test `fully_evolved` on a card that was previously at T4 when `new_tier` drops below 4). **Router not registered in `main.py`** The PR adds `app/routers_v2/evolution.py` and `app/services/evolution_evaluator.py` but does not modify `main.py` to import and register the new router. The `/api/v2/evolution/cards/{card_id}/evaluate` endpoint will be unreachable after merging. The GET /tracks endpoints also won't be served until the router is included. The WP-06 PR (#71, commit `ddf6ff5`) added `main.py` changes when it introduced `evolution.py`; this PR needs to do the same since it's the first to create that file on `next-release`. #### Security **Broad `except Exception` in `get_track` handler** `evolution.py` line 44: ```python except Exception: raise HTTPException(status_code=404, detail=f"Track {track_id} not found") ``` This catches every possible exception and surfaces it as a 404. A database connectivity failure, an ORM misconfiguration, or any unexpected error will silently become a misleading 404. The codebase already had a merged fix specifically for this pattern (PR #48 / commit `0c04216`: "fix: replace broad except Exception blocks with DoesNotExist"). This PR reintroduces the same anti-pattern. The correct fix (consistent with the existing `get_card_state` handler in the in-progress WP-07 work): ```python from ..db_engine import DoesNotExist, EvolutionTrack try: track = EvolutionTrack.get_by_id(track_id) except DoesNotExist: raise HTTPException(status_code=404, detail=f"Track {track_id} not found") ``` The `evaluate_card` handler on line 59 has the same issue (`except Exception` catching Card lookup failures). #### Style & Conventions **`datetime.utcnow()` deprecation** `evolution_evaluator.py` line 134. The PR author notes this in the PR description and correctly identifies it as a pre-existing codebase pattern — no action needed here. **Lazy imports in the evaluator service are justified** The dependency-injection approach for `_stats_model`, `_state_model`, `_compute_value_fn`, `_tier_from_value_fn` is well-suited for the parallel-branch development context and produces clean, testable code. The `noqa: PLC0415` suppressions on the conditional lazy imports are appropriate. **`conftest.py` approach is correct** Setting `DATABASE_TYPE=postgresql` before imports to prevent test collection from touching the SQLite file is the right pattern for this codebase. #### Test Coverage The 15 tests are well-structured and cover the specified requirements (tier assignment, advancement, partial progress, idempotency, multi-season aggregation, missing state, return shape). The use of stub models bound to an in-memory SQLite database is clean and doesn't require any merged dependencies. One gap: there is no test for the `fully_evolved` inconsistency described above — a test where `current_tier` is already 4 and `new_tier` computes below 4 would expose the bug. --- ### Verdict: REQUEST_CHANGES Two actionable issues must be fixed before merging: 1. **`fully_evolved` bug** (`evolution_evaluator.py` line 138): change `new_tier >= 4` to `card_state.current_tier >= 4` to keep `fully_evolved` consistent with the no-regression guarantee. 2. **Router not registered** (`main.py`): import and include `evolution.router` in the application, otherwise all endpoints in this PR are unreachable. The broad `except Exception` pattern in the router (lines 44 and 59) should also be narrowed to `DoesNotExist`, consistent with the codebase's established fix for this issue (PR #48). The evaluator service logic, injection-based testing architecture, and test suite are otherwise solid. --- *Automated review by Claude PR Reviewer*

app/routers_v2/evolution.py Outdated

						
				@ -0,0 +41,4 @@

				        raise HTTPException(status_code=404, detail=f"Track {track_id} not found")

				    return model_to_dict(track, recurse=False)

cal commented

2026-03-13 13:17:06 +00:00

Broad except Exception catches database connectivity failures and ORM errors as 404s. PR #48 (commit 0c04216) already fixed this pattern project-wide. Use DoesNotExist instead:

from ..db_engine import DoesNotExist, EvolutionTrack
...
except DoesNotExist:
    raise HTTPException(status_code=404, detail=f"Track {track_id} not found")

Broad `except Exception` catches database connectivity failures and ORM errors as 404s. PR #48 (commit `0c04216`) already fixed this pattern project-wide. Use `DoesNotExist` instead: ```python from ..db_engine import DoesNotExist, EvolutionTrack ... except DoesNotExist: raise HTTPException(status_code=404, detail=f"Track {track_id} not found") ```

app/routers_v2/evolution.py Outdated

						
				@ -0,0 +56,4 @@

				    from ..db_engine import Card

				    from ..services.evolution_evaluator import evaluate_card as _evaluate

cal commented

2026-03-13 13:17:06 +00:00

Same broad except Exception issue for the Card lookup. Should catch DoesNotExist specifically.

Same broad `except Exception` issue for the Card lookup. Should catch `DoesNotExist` specifically.

app/services/evolution_evaluator.py Outdated

						
				@ -0,0 +135,4 @@

				    now = datetime.utcnow()

				    card_state.current_value = value

				    card_state.current_tier = max(card_state.current_tier, new_tier)

				    card_state.fully_evolved = new_tier >= 4

cal commented

2026-03-13 13:17:06 +00:00

fully_evolved should be derived from card_state.current_tier, not new_tier. With the current code, if a previously T4 card is re-evaluated and new_tier computes to 3 (e.g. after a data correction), current_tier correctly stays at 4 but fully_evolved is erroneously set to False.

Change to:

card_state.fully_evolved = card_state.current_tier >= 4

`fully_evolved` should be derived from `card_state.current_tier`, not `new_tier`. With the current code, if a previously T4 card is re-evaluated and `new_tier` computes to 3 (e.g. after a data correction), `current_tier` correctly stays at 4 but `fully_evolved` is erroneously set to `False`. Change to: ```python card_state.fully_evolved = card_state.current_tier >= 4 ```

cal changed target branch from next-release to card-evolution

2026-03-17 20:55:50 +00:00

cal requested changes 2026-03-18 18:35:47 +00:00

Dismissed

cal left a comment

AI Code Review

Files Reviewed

app/services/evolution_evaluator.py (added)
app/routers_v2/evolution.py (added)
tests/test_evolution_evaluator.py (added)
tests/__init__.py (added)
tests/conftest.py (added)

Findings

Correctness — Two Runtime Bugs (Blockers)

Bug 1: _CareerTotals.k mismatches the formula engine's expected stats.strikeouts

_CareerTotals defines a k slot (line 28 of evolution_evaluator.py), but formula_engine._pitcher_value (line 41) reads stats.strikeouts. When this evaluator calls _compute_value_fn(track.card_type, totals) for an sp or rp card, it will raise AttributeError: '_CareerTotals' object has no attribute 'strikeouts' in production.

The fix is to rename the slot and constructor parameter from k to strikeouts (matching the PitcherStats Protocol in formula_engine.py), and update the career sum on line that reads k=sum(r.k for r in rows) to strikeouts=sum(r.k for r in rows) — using the DB column name r.k for aggregation but the formula-engine attribute name strikeouts on the object.

The unit tests do not catch this because they inject a stub _compute_value_fn that uses stats.k instead of stats.strikeouts. The test stub and the production formula engine have diverged on this attribute name.

Bug 2: EvolutionTrack DB model field names are t1_threshold/t2_threshold/t3_threshold/t4_threshold, but tier_from_value reads track.t1/track.t2/track.t3/track.t4

Inspecting db_engine.py lines 1270–1273:

t1_threshold = IntegerField()
t2_threshold = IntegerField()
t3_threshold = IntegerField()
t4_threshold = IntegerField()

And formula_engine.tier_from_value lines 98–99:

t1, t2, t3, t4 = track.t1, track.t2, track.t3, track.t4

When evaluate_card passes the real EvolutionTrack Peewee model to _tier_from_value_fn, it will raise AttributeError: 'EvolutionTrack' object has no attribute 't1'. This affects all card types.

Note on scope: Both bugs originate from a naming inconsistency introduced in WP-09 (formula_engine.py was written expecting t1/t2/t3/t4 but the DB schema uses t1_threshold). However, this PR is the first to wire those two together in production code, and the bug will manifest here. It needs a fix in either the model or the formula engine before this PR can be merged safely. Please coordinate with the WP-09 author or fix the naming in this PR.

Bug 3 (Semantic inconsistency — minor): fully_evolved is set from new_tier rather than the post-regression current_tier

The spec says fully_evolved = (new_tier >= 4). The code on line ~133 of evolution_evaluator.py:

card_state.current_tier = max(card_state.current_tier, new_tier)
card_state.fully_evolved = new_tier >= 4

If a player reached T4 in a previous evaluation (so current_tier was preserved at 4 by the no-regression rule, and fully_evolved was True), and a later re-evaluation produces a new_tier of only 3 (e.g. career stats were corrected), fully_evolved will be written as False even though current_tier remains 4. The persisted state would then read current_tier=4, fully_evolved=False, which is contradictory.

Suggested fix: derive fully_evolved from the final current_tier after the max() operation:

card_state.current_tier = max(card_state.current_tier, new_tier)
card_state.fully_evolved = card_state.current_tier >= 4

This keeps fully_evolved consistent with the stored tier, matching the intent of the no-regression guarantee.

Security

Auth is applied consistently on all three endpoints (list_tracks, get_track, evaluate_card) using the existing oauth2_scheme/valid_token pattern. No issues found.
No injection vectors. The only external input is card_id (integer path parameter) used only in Card.get_by_id. No issues found.

Style & Conventions

Lazy imports are inconsistent with codebase guidance

The project CLAUDE.md states "Never add lazy imports to middle of file." The router (evolution.py) uses lazy imports inside handler bodies for EvolutionTrack (lines 20, 36) and Card/evolution_evaluator (lines 54–55). Other routers in the codebase (e.g. cards.py) import from db_engine at the module top-level. The lazy-import pattern was chosen to avoid circular import or "before models merge" concerns, but those concerns don't apply once all WP branches are merged. The imports should be moved to the top of the file.

datetime.utcnow() deprecation

The PR description itself flags this. It's a pre-existing pattern but since this is a new file being introduced, it would be easy to use datetime.now(timezone.utc) here. Not blocking.

_CareerTotals.__slots__ vs dataclass

Minor: _CareerTotals uses __slots__ manually, which is unusual for a simple data bag. A dataclasses.dataclass(slots=True) would be more idiomatic Python and reduce boilerplate. Not blocking.

Router file bundles WP-06 and WP-08 work

evolution.py includes the GET /tracks endpoints from WP-06 (not yet in card-evolution). The PR description notes the conflict risk. This is a functional concern only — if WP-06 merges first, this will conflict on the router file. Not a blocker by itself but worth tracking.

Test Coverage

The 15 unit tests are thorough and well-structured. They cover the stated scenarios: tier assignment at boundaries, no-regression, idempotency, multi-season summing, per-team isolation, missing state error, and return shape. Docstrings are clear. The use of in-memory SQLite stubs to avoid WP-05/WP-07 dependencies is a good approach.

However, as noted under Bug 1, the pitcher stub (_compute_value in the test file, line ~108) uses stats.k while the production formula engine uses stats.strikeouts. There is no test exercising pitcher evaluation against the real formula engine interface, which is how this mismatch went undetected.

Suggested additions:

One test exercising sp or rp card evaluation to catch the k vs strikeouts attribute mismatch
One test verifying fully_evolved remains True after a re-evaluation that would have regressed the tier (covering Bug 3)

Verdict: REQUEST_CHANGES

There are two confirmed runtime AttributeErrors that will occur in production:

Pitcher card evaluation crashes due to _CareerTotals.k vs formula_engine's expected stats.strikeouts.
All card types will crash in tier_from_value because the real EvolutionTrack model exposes t1_threshold/t2_threshold/t3_threshold/t4_threshold but the formula engine reads t1/t2/t3/t4.

Both are straightforward naming fixes, and the overall design and test structure are solid. The logic for career SUMs, the no-regression guarantee, idempotency, and auth enforcement are all correctly implemented.

Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `app/services/evolution_evaluator.py` (added) - `app/routers_v2/evolution.py` (added) - `tests/test_evolution_evaluator.py` (added) - `tests/__init__.py` (added) - `tests/conftest.py` (added) --- ### Findings #### Correctness — Two Runtime Bugs (Blockers) **Bug 1: `_CareerTotals.k` mismatches the formula engine's expected `stats.strikeouts`** `_CareerTotals` defines a `k` slot (line 28 of `evolution_evaluator.py`), but `formula_engine._pitcher_value` (line 41) reads `stats.strikeouts`. When this evaluator calls `_compute_value_fn(track.card_type, totals)` for an `sp` or `rp` card, it will raise `AttributeError: '_CareerTotals' object has no attribute 'strikeouts'` in production. The fix is to rename the slot and constructor parameter from `k` to `strikeouts` (matching the `PitcherStats` Protocol in `formula_engine.py`), and update the career sum on line that reads `k=sum(r.k for r in rows)` to `strikeouts=sum(r.k for r in rows)` — using the DB column name `r.k` for aggregation but the formula-engine attribute name `strikeouts` on the object. The unit tests do not catch this because they inject a stub `_compute_value_fn` that uses `stats.k` instead of `stats.strikeouts`. The test stub and the production formula engine have diverged on this attribute name. **Bug 2: `EvolutionTrack` DB model field names are `t1_threshold`/`t2_threshold`/`t3_threshold`/`t4_threshold`, but `tier_from_value` reads `track.t1`/`track.t2`/`track.t3`/`track.t4`** Inspecting `db_engine.py` lines 1270–1273: ```python t1_threshold = IntegerField() t2_threshold = IntegerField() t3_threshold = IntegerField() t4_threshold = IntegerField() ``` And `formula_engine.tier_from_value` lines 98–99: ```python t1, t2, t3, t4 = track.t1, track.t2, track.t3, track.t4 ``` When `evaluate_card` passes the real `EvolutionTrack` Peewee model to `_tier_from_value_fn`, it will raise `AttributeError: 'EvolutionTrack' object has no attribute 't1'`. This affects all card types. **Note on scope**: Both bugs originate from a naming inconsistency introduced in WP-09 (`formula_engine.py` was written expecting `t1/t2/t3/t4` but the DB schema uses `t1_threshold`). However, this PR is the first to wire those two together in production code, and the bug will manifest here. It needs a fix in either the model or the formula engine before this PR can be merged safely. Please coordinate with the WP-09 author or fix the naming in this PR. --- **Bug 3 (Semantic inconsistency — minor): `fully_evolved` is set from `new_tier` rather than the post-regression `current_tier`** The spec says `fully_evolved = (new_tier >= 4)`. The code on line ~133 of `evolution_evaluator.py`: ```python card_state.current_tier = max(card_state.current_tier, new_tier) card_state.fully_evolved = new_tier >= 4 ``` If a player reached T4 in a previous evaluation (so `current_tier` was preserved at 4 by the no-regression rule, and `fully_evolved` was `True`), and a later re-evaluation produces a `new_tier` of only 3 (e.g. career stats were corrected), `fully_evolved` will be written as `False` even though `current_tier` remains 4. The persisted state would then read `current_tier=4, fully_evolved=False`, which is contradictory. Suggested fix: derive `fully_evolved` from the final `current_tier` after the max() operation: ```python card_state.current_tier = max(card_state.current_tier, new_tier) card_state.fully_evolved = card_state.current_tier >= 4 ``` This keeps `fully_evolved` consistent with the stored tier, matching the intent of the no-regression guarantee. --- #### Security - Auth is applied consistently on all three endpoints (`list_tracks`, `get_track`, `evaluate_card`) using the existing `oauth2_scheme`/`valid_token` pattern. No issues found. - No injection vectors. The only external input is `card_id` (integer path parameter) used only in `Card.get_by_id`. No issues found. --- #### Style & Conventions **Lazy imports are inconsistent with codebase guidance** The project CLAUDE.md states "Never add lazy imports to middle of file." The router (`evolution.py`) uses lazy imports inside handler bodies for `EvolutionTrack` (lines 20, 36) and `Card`/`evolution_evaluator` (lines 54–55). Other routers in the codebase (e.g. `cards.py`) import from `db_engine` at the module top-level. The lazy-import pattern was chosen to avoid circular import or "before models merge" concerns, but those concerns don't apply once all WP branches are merged. The imports should be moved to the top of the file. **`datetime.utcnow()` deprecation** The PR description itself flags this. It's a pre-existing pattern but since this is a new file being introduced, it would be easy to use `datetime.now(timezone.utc)` here. Not blocking. **`_CareerTotals.__slots__` vs dataclass** Minor: `_CareerTotals` uses `__slots__` manually, which is unusual for a simple data bag. A `dataclasses.dataclass(slots=True)` would be more idiomatic Python and reduce boilerplate. Not blocking. **Router file bundles WP-06 and WP-08 work** `evolution.py` includes the GET `/tracks` endpoints from WP-06 (not yet in `card-evolution`). The PR description notes the conflict risk. This is a functional concern only — if WP-06 merges first, this will conflict on the router file. Not a blocker by itself but worth tracking. --- #### Test Coverage The 15 unit tests are thorough and well-structured. They cover the stated scenarios: tier assignment at boundaries, no-regression, idempotency, multi-season summing, per-team isolation, missing state error, and return shape. Docstrings are clear. The use of in-memory SQLite stubs to avoid WP-05/WP-07 dependencies is a good approach. However, as noted under Bug 1, the pitcher stub (`_compute_value` in the test file, line ~108) uses `stats.k` while the production formula engine uses `stats.strikeouts`. There is no test exercising pitcher evaluation against the real formula engine interface, which is how this mismatch went undetected. Suggested additions: - One test exercising `sp` or `rp` card evaluation to catch the `k` vs `strikeouts` attribute mismatch - One test verifying `fully_evolved` remains `True` after a re-evaluation that would have regressed the tier (covering Bug 3) --- ### Verdict: REQUEST_CHANGES There are two confirmed runtime `AttributeError`s that will occur in production: 1. Pitcher card evaluation crashes due to `_CareerTotals.k` vs `formula_engine`'s expected `stats.strikeouts`. 2. All card types will crash in `tier_from_value` because the real `EvolutionTrack` model exposes `t1_threshold`/`t2_threshold`/`t3_threshold`/`t4_threshold` but the formula engine reads `t1`/`t2`/`t3`/`t4`. Both are straightforward naming fixes, and the overall design and test structure are solid. The logic for career SUMs, the no-regression guarantee, idempotency, and auth enforcement are all correctly implemented. --- *Automated review by Claude PR Reviewer*

cal added 1 commit 2026-03-18 20:06:40 +00:00

fix: align naming between evaluator, formula engine, and DB models 689ff4b70f

- Rename _CareerTotals.k → .strikeouts to match formula engine's
  stats.strikeouts Protocol
- Update test stubs: TrackStub fields t1→t1_threshold etc. to match
  EvolutionTrack model
- Fix fully_evolved logic: derive from post-max current_tier, not
  new_tier (prevents contradictory state on tier regression)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cal approved these changes 2026-03-18 20:18:56 +00:00

cal left a comment

AI Code Review (Re-review)

Files Reviewed

app/services/evolution_evaluator.py (added, with fix commit 689ff4b)
app/routers_v2/evolution.py (added)
tests/test_evolution_evaluator.py (added, with fix commit 689ff4b)
tests/conftest.py (added)
tests/__init__.py (added)

Fix Verification

All three bugs flagged in the previous review have been correctly addressed in commit 689ff4b.

Fix 1: `stats.k` vs `stats.strikeouts`

Confirmed. _CareerTotals.__slots__ now declares "strikeouts" (not "k"), the constructor parameter is strikeouts, and self.strikeouts = strikeouts is assigned. The aggregation call passes strikeouts=sum(r.k for r in rows) — correctly reading the DB column name (r.k) and assigning it to the Protocol-matching attribute (strikeouts). The formula stub in tests now reads stats.strikeouts. Clean.

Fix 2: `track.t1` vs `track.t1_threshold`

Confirmed. TrackStub fields are now t1_threshold, t2_threshold, t3_threshold, t4_threshold. Both the isinstance(track, dict) branch and the attribute branch in _tier_from_value access the _threshold-suffixed keys. Fixture creation calls use t1_threshold=37 etc. All aligned with the DB model.

Fix 3: `fully_evolved` logic

Confirmed. The line is now:

card_state.fully_evolved = card_state.current_tier >= 4

This reads the post-max() value of current_tier, so a no-regression hold correctly sets fully_evolved=True when the held tier is 4. Previously using new_tier could produce fully_evolved=False while current_tier=4, a contradictory state.

Findings

Correctness

All three fixes are correctly applied and logically sound.
The StatsStub.k column name in tests is intentionally the raw DB column name — it is read via r.k in the aggregation and mapped to strikeouts on _CareerTotals. This is correct.
No new regressions introduced.

Minor: Stale docstring in `_CareerTotals`

The class docstring on line 32 still reads sp/rp: outs, k after the rename to .strikeouts. This is documentation drift only — it does not affect runtime behaviour or test correctness. Does not warrant blocking merge.

Security

No issues. Auth check (valid_token) is applied consistently on all three router endpoints before any model access.
Card.get_by_id exceptions are caught and surfaced as 404; ValueError from the evaluator is caught as 404. No unhandled exceptions that would leak internals.

Style & Conventions

Lazy imports inside functions follow the existing project pattern and the pre-merge dependency management strategy is well-documented.
datetime.utcnow() deprecation is a pre-existing codebase pattern, correctly noted in the PR body as out of scope.
15 unit tests cover all specified requirements from the issue.

Suggestions

The stale sp/rp: outs, k in _CareerTotals.__doc__ should be updated to sp/rp: outs, strikeouts in a follow-up or during WP-09 integration. Low priority.

Verdict: APPROVED

All three previously flagged bugs are correctly fixed. The evaluator logic is sound — tier computation, no-regression guarantee, and fully_evolved derivation all behave as specified. No new issues introduced by the fix commit.

Automated review by Claude PR Reviewer

## AI Code Review (Re-review) ### Files Reviewed - `app/services/evolution_evaluator.py` (added, with fix commit 689ff4b) - `app/routers_v2/evolution.py` (added) - `tests/test_evolution_evaluator.py` (added, with fix commit 689ff4b) - `tests/conftest.py` (added) - `tests/__init__.py` (added) ### Fix Verification All three bugs flagged in the previous review have been correctly addressed in commit `689ff4b`. #### Fix 1: `stats.k` vs `stats.strikeouts` Confirmed. `_CareerTotals.__slots__` now declares `"strikeouts"` (not `"k"`), the constructor parameter is `strikeouts`, and `self.strikeouts = strikeouts` is assigned. The aggregation call passes `strikeouts=sum(r.k for r in rows)` — correctly reading the DB column name (`r.k`) and assigning it to the Protocol-matching attribute (`strikeouts`). The formula stub in tests now reads `stats.strikeouts`. Clean. #### Fix 2: `track.t1` vs `track.t1_threshold` Confirmed. `TrackStub` fields are now `t1_threshold`, `t2_threshold`, `t3_threshold`, `t4_threshold`. Both the `isinstance(track, dict)` branch and the attribute branch in `_tier_from_value` access the `_threshold`-suffixed keys. Fixture creation calls use `t1_threshold=37` etc. All aligned with the DB model. #### Fix 3: `fully_evolved` logic Confirmed. The line is now: ```python card_state.fully_evolved = card_state.current_tier >= 4 ``` This reads the post-`max()` value of `current_tier`, so a no-regression hold correctly sets `fully_evolved=True` when the held tier is 4. Previously using `new_tier` could produce `fully_evolved=False` while `current_tier=4`, a contradictory state. ### Findings #### Correctness - All three fixes are correctly applied and logically sound. - The `StatsStub.k` column name in tests is intentionally the raw DB column name — it is read via `r.k` in the aggregation and mapped to `strikeouts` on `_CareerTotals`. This is correct. - No new regressions introduced. #### Minor: Stale docstring in `_CareerTotals` The class docstring on line 32 still reads `sp/rp: outs, k` after the rename to `.strikeouts`. This is documentation drift only — it does not affect runtime behaviour or test correctness. Does not warrant blocking merge. #### Security - No issues. Auth check (`valid_token`) is applied consistently on all three router endpoints before any model access. - `Card.get_by_id` exceptions are caught and surfaced as 404; `ValueError` from the evaluator is caught as 404. No unhandled exceptions that would leak internals. #### Style & Conventions - Lazy imports inside functions follow the existing project pattern and the pre-merge dependency management strategy is well-documented. - `datetime.utcnow()` deprecation is a pre-existing codebase pattern, correctly noted in the PR body as out of scope. - 15 unit tests cover all specified requirements from the issue. ### Suggestions - The stale `sp/rp: outs, k` in `_CareerTotals.__doc__` should be updated to `sp/rp: outs, strikeouts` in a follow-up or during WP-09 integration. Low priority. ### Verdict: APPROVED All three previously flagged bugs are correctly fixed. The evaluator logic is sound — tier computation, no-regression guarantee, and `fully_evolved` derivation all behave as specified. No new issues introduced by the fix commit. --- *Automated review by Claude PR Reviewer*

cal force-pushed ai/paper-dynasty-database#73 from 689ff4b70f to 64b6225c41

2026-03-18 20:31:51 +00:00

Compare

cal merged commit 55f98b8246 into card-evolution

2026-03-18 20:31:55 +00:00

cal deleted branch ai/paper-dynasty-database#73

2026-03-18 20:31:55 +00:00

cal referenced this issue from a commit

2026-03-18 20:31:56 +00:00

Merge pull request 'feat: WP-08 evaluate endpoint and evolution evaluator service (#73)' (#98) from ai/paper-dynasty-database#73 into card-evolution

cal removed the

ai-reviewing

label 2026-03-23 15:33:03 +00:00

Sign in to join this conversation.

No reviewers

No Milestone

No project

No Assignees

2 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/paper-dynasty-database#98

feat: WP-08 evaluate endpoint and evolution evaluator service (#73) #98

Summary

Evaluation logic implemented

Test results

Other observations

AI Code Review

Files Reviewed

Findings

Correctness

Security

Style & Conventions

Test Coverage

Verdict: REQUEST_CHANGES

AI Code Review

Files Reviewed

Findings

Correctness — Two Runtime Bugs (Blockers)

Security

Style & Conventions

Test Coverage

Verdict: REQUEST_CHANGES

AI Code Review (Re-review)

Files Reviewed

Fix Verification

Fix 1: stats.k vs stats.strikeouts

Fix 2: track.t1 vs track.t1_threshold

Fix 3: fully_evolved logic

Findings

Correctness

Minor: Stale docstring in _CareerTotals

Security

Style & Conventions

Suggestions

Verdict: APPROVED

Fix 1: `stats.k` vs `stats.strikeouts`

Fix 2: `track.t1` vs `track.t1_threshold`

Fix 3: `fully_evolved` logic

Minor: Stale docstring in `_CareerTotals`