feat: WP-08 evaluate endpoint and evolution evaluator service (#73) #98
No reviewers
Labels
No Label
ai-changes-requested
ai-failed
ai-merged
ai-pr-opened
ai-reviewed
ai-reviewing
ai-reviewing
ai-working
bug
enhancement
evolution
performance
phase-0
phase-1a
phase-1b
phase-1c
phase-1d
security
tech-debt
todo
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/paper-dynasty-database#98
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "ai/paper-dynasty-database#73"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #73
Summary
app/services/evolution_evaluator.py:evaluate_card(player_id, team_id)aggregates careerplayer_season_statstotals, delegates to formula engine for value/tier computation, updatesevolution_card_statewith no-regression guarantee (current_tier = max(current_tier, new_tier))app/routers_v2/evolution.py:POST /api/v2/evolution/cards/{card_id}/evaluateendpoint (resolves card_id → player_id/team_id via Card lookup) plus the WP-06 GET /tracks endpointstests/test_evolution_evaluator.py: 15 unit tests covering all issue requirements — tier assignment, advancement, partial progress, idempotency, fully evolved, no regression, multi-season aggregation, missing state error, return shapeEvaluation logic implemented
player_season_statsrows for(player_id, team_id)across all seasonscard_state.trackcompute_value_for_track)new_tierviatier_from_valuecard_state.current_value = computed valuecard_state.current_tier = max(current_tier, new_tier)— no regressioncard_state.fully_evolved = (new_tier >= 4)card_state.last_evaluated_at = NOW()Test results
All tests use stub Peewee models bound to in-memory SQLite and inject stub formula functions — no dependency on WP-01/WP-05/WP-07/WP-09 being merged.
Other observations
datetime.utcnow()is deprecated in Python 3.12+; pre-existing pattern in the codebase — not fixed hereevolution.pyrouter includes the GET /tracks endpoints from WP-06 (they don't exist innext-releaseyet since that PR hasn't merged); if WP-06 merges first this will conflict on the router fileAI Code Review
Files Reviewed
app/routers_v2/evolution.py(added)app/services/evolution_evaluator.py(added)tests/conftest.py(added)tests/__init__.py(added)tests/test_evolution_evaluator.py(added)Findings
Correctness
Bug:
fully_evolvedcan be wrongly cleared under no-regression guaranteeevolution_evaluator.pyline 138:current_tieris correctly protected bymax(), butfully_evolvedis derived from the unprotectednew_tier. If a player at T4 is re-evaluated after a data correction causesnew_tierto compute as 3,current_tierstays at 4 (correct) butfully_evolvedgets set toFalse(incorrect). These two fields are now inconsistent.The fix is one character:
card_state.fully_evolved = card_state.current_tier >= 4. The test suite does not cover this case (the no-regression tests only verifycurrent_tierstays the same; none of them testfully_evolvedon a card that was previously at T4 whennew_tierdrops below 4).Router not registered in
main.pyThe PR adds
app/routers_v2/evolution.pyandapp/services/evolution_evaluator.pybut does not modifymain.pyto import and register the new router. The/api/v2/evolution/cards/{card_id}/evaluateendpoint will be unreachable after merging. The GET /tracks endpoints also won't be served until the router is included.The WP-06 PR (#71, commit
ddf6ff5) addedmain.pychanges when it introducedevolution.py; this PR needs to do the same since it's the first to create that file onnext-release.Security
Broad
except Exceptioninget_trackhandlerevolution.pyline 44:This catches every possible exception and surfaces it as a 404. A database connectivity failure, an ORM misconfiguration, or any unexpected error will silently become a misleading 404. The codebase already had a merged fix specifically for this pattern (PR #48 / commit
0c04216: "fix: replace broad except Exception blocks with DoesNotExist"). This PR reintroduces the same anti-pattern.The correct fix (consistent with the existing
get_card_statehandler in the in-progress WP-07 work):The
evaluate_cardhandler on line 59 has the same issue (except Exceptioncatching Card lookup failures).Style & Conventions
datetime.utcnow()deprecationevolution_evaluator.pyline 134. The PR author notes this in the PR description and correctly identifies it as a pre-existing codebase pattern — no action needed here.Lazy imports in the evaluator service are justified
The dependency-injection approach for
_stats_model,_state_model,_compute_value_fn,_tier_from_value_fnis well-suited for the parallel-branch development context and produces clean, testable code. Thenoqa: PLC0415suppressions on the conditional lazy imports are appropriate.conftest.pyapproach is correctSetting
DATABASE_TYPE=postgresqlbefore imports to prevent test collection from touching the SQLite file is the right pattern for this codebase.Test Coverage
The 15 tests are well-structured and cover the specified requirements (tier assignment, advancement, partial progress, idempotency, multi-season aggregation, missing state, return shape). The use of stub models bound to an in-memory SQLite database is clean and doesn't require any merged dependencies.
One gap: there is no test for the
fully_evolvedinconsistency described above — a test wherecurrent_tieris already 4 andnew_tiercomputes below 4 would expose the bug.Verdict: REQUEST_CHANGES
Two actionable issues must be fixed before merging:
fully_evolvedbug (evolution_evaluator.pyline 138): changenew_tier >= 4tocard_state.current_tier >= 4to keepfully_evolvedconsistent with the no-regression guarantee.main.py): import and includeevolution.routerin the application, otherwise all endpoints in this PR are unreachable.The broad
except Exceptionpattern in the router (lines 44 and 59) should also be narrowed toDoesNotExist, consistent with the codebase's established fix for this issue (PR #48).The evaluator service logic, injection-based testing architecture, and test suite are otherwise solid.
Automated review by Claude PR Reviewer
@ -0,0 +41,4 @@raise HTTPException(status_code=404, detail=f"Track {track_id} not found")return model_to_dict(track, recurse=False)Broad
except Exceptioncatches database connectivity failures and ORM errors as 404s. PR #48 (commit0c04216) already fixed this pattern project-wide. UseDoesNotExistinstead:@ -0,0 +56,4 @@from ..db_engine import Cardfrom ..services.evolution_evaluator import evaluate_card as _evaluateSame broad
except Exceptionissue for the Card lookup. Should catchDoesNotExistspecifically.@ -0,0 +135,4 @@now = datetime.utcnow()card_state.current_value = valuecard_state.current_tier = max(card_state.current_tier, new_tier)card_state.fully_evolved = new_tier >= 4fully_evolvedshould be derived fromcard_state.current_tier, notnew_tier. With the current code, if a previously T4 card is re-evaluated andnew_tiercomputes to 3 (e.g. after a data correction),current_tiercorrectly stays at 4 butfully_evolvedis erroneously set toFalse.Change to:
AI Code Review
Files Reviewed
app/services/evolution_evaluator.py(added)app/routers_v2/evolution.py(added)tests/test_evolution_evaluator.py(added)tests/__init__.py(added)tests/conftest.py(added)Findings
Correctness — Two Runtime Bugs (Blockers)
Bug 1:
_CareerTotals.kmismatches the formula engine's expectedstats.strikeouts_CareerTotalsdefines akslot (line 28 ofevolution_evaluator.py), butformula_engine._pitcher_value(line 41) readsstats.strikeouts. When this evaluator calls_compute_value_fn(track.card_type, totals)for ansporrpcard, it will raiseAttributeError: '_CareerTotals' object has no attribute 'strikeouts'in production.The fix is to rename the slot and constructor parameter from
ktostrikeouts(matching thePitcherStatsProtocol informula_engine.py), and update the career sum on line that readsk=sum(r.k for r in rows)tostrikeouts=sum(r.k for r in rows)— using the DB column namer.kfor aggregation but the formula-engine attribute namestrikeoutson the object.The unit tests do not catch this because they inject a stub
_compute_value_fnthat usesstats.kinstead ofstats.strikeouts. The test stub and the production formula engine have diverged on this attribute name.Bug 2:
EvolutionTrackDB model field names aret1_threshold/t2_threshold/t3_threshold/t4_threshold, buttier_from_valuereadstrack.t1/track.t2/track.t3/track.t4Inspecting
db_engine.pylines 1270–1273:And
formula_engine.tier_from_valuelines 98–99:When
evaluate_cardpasses the realEvolutionTrackPeewee model to_tier_from_value_fn, it will raiseAttributeError: 'EvolutionTrack' object has no attribute 't1'. This affects all card types.Note on scope: Both bugs originate from a naming inconsistency introduced in WP-09 (
formula_engine.pywas written expectingt1/t2/t3/t4but the DB schema usest1_threshold). However, this PR is the first to wire those two together in production code, and the bug will manifest here. It needs a fix in either the model or the formula engine before this PR can be merged safely. Please coordinate with the WP-09 author or fix the naming in this PR.Bug 3 (Semantic inconsistency — minor):
fully_evolvedis set fromnew_tierrather than the post-regressioncurrent_tierThe spec says
fully_evolved = (new_tier >= 4). The code on line ~133 ofevolution_evaluator.py:If a player reached T4 in a previous evaluation (so
current_tierwas preserved at 4 by the no-regression rule, andfully_evolvedwasTrue), and a later re-evaluation produces anew_tierof only 3 (e.g. career stats were corrected),fully_evolvedwill be written asFalseeven thoughcurrent_tierremains 4. The persisted state would then readcurrent_tier=4, fully_evolved=False, which is contradictory.Suggested fix: derive
fully_evolvedfrom the finalcurrent_tierafter the max() operation:This keeps
fully_evolvedconsistent with the stored tier, matching the intent of the no-regression guarantee.Security
list_tracks,get_track,evaluate_card) using the existingoauth2_scheme/valid_tokenpattern. No issues found.card_id(integer path parameter) used only inCard.get_by_id. No issues found.Style & Conventions
Lazy imports are inconsistent with codebase guidance
The project CLAUDE.md states "Never add lazy imports to middle of file." The router (
evolution.py) uses lazy imports inside handler bodies forEvolutionTrack(lines 20, 36) andCard/evolution_evaluator(lines 54–55). Other routers in the codebase (e.g.cards.py) import fromdb_engineat the module top-level. The lazy-import pattern was chosen to avoid circular import or "before models merge" concerns, but those concerns don't apply once all WP branches are merged. The imports should be moved to the top of the file.datetime.utcnow()deprecationThe PR description itself flags this. It's a pre-existing pattern but since this is a new file being introduced, it would be easy to use
datetime.now(timezone.utc)here. Not blocking._CareerTotals.__slots__vs dataclassMinor:
_CareerTotalsuses__slots__manually, which is unusual for a simple data bag. Adataclasses.dataclass(slots=True)would be more idiomatic Python and reduce boilerplate. Not blocking.Router file bundles WP-06 and WP-08 work
evolution.pyincludes the GET/tracksendpoints from WP-06 (not yet incard-evolution). The PR description notes the conflict risk. This is a functional concern only — if WP-06 merges first, this will conflict on the router file. Not a blocker by itself but worth tracking.Test Coverage
The 15 unit tests are thorough and well-structured. They cover the stated scenarios: tier assignment at boundaries, no-regression, idempotency, multi-season summing, per-team isolation, missing state error, and return shape. Docstrings are clear. The use of in-memory SQLite stubs to avoid WP-05/WP-07 dependencies is a good approach.
However, as noted under Bug 1, the pitcher stub (
_compute_valuein the test file, line ~108) usesstats.kwhile the production formula engine usesstats.strikeouts. There is no test exercising pitcher evaluation against the real formula engine interface, which is how this mismatch went undetected.Suggested additions:
sporrpcard evaluation to catch thekvsstrikeoutsattribute mismatchfully_evolvedremainsTrueafter a re-evaluation that would have regressed the tier (covering Bug 3)Verdict: REQUEST_CHANGES
There are two confirmed runtime
AttributeErrors that will occur in production:_CareerTotals.kvsformula_engine's expectedstats.strikeouts.tier_from_valuebecause the realEvolutionTrackmodel exposest1_threshold/t2_threshold/t3_threshold/t4_thresholdbut the formula engine readst1/t2/t3/t4.Both are straightforward naming fixes, and the overall design and test structure are solid. The logic for career SUMs, the no-regression guarantee, idempotency, and auth enforcement are all correctly implemented.
Automated review by Claude PR Reviewer
AI Code Review (Re-review)
Files Reviewed
app/services/evolution_evaluator.py(added, with fix commit689ff4b)app/routers_v2/evolution.py(added)tests/test_evolution_evaluator.py(added, with fix commit689ff4b)tests/conftest.py(added)tests/__init__.py(added)Fix Verification
All three bugs flagged in the previous review have been correctly addressed in commit
689ff4b.Fix 1:
stats.kvsstats.strikeoutsConfirmed.
_CareerTotals.__slots__now declares"strikeouts"(not"k"), the constructor parameter isstrikeouts, andself.strikeouts = strikeoutsis assigned. The aggregation call passesstrikeouts=sum(r.k for r in rows)— correctly reading the DB column name (r.k) and assigning it to the Protocol-matching attribute (strikeouts). The formula stub in tests now readsstats.strikeouts. Clean.Fix 2:
track.t1vstrack.t1_thresholdConfirmed.
TrackStubfields are nowt1_threshold,t2_threshold,t3_threshold,t4_threshold. Both theisinstance(track, dict)branch and the attribute branch in_tier_from_valueaccess the_threshold-suffixed keys. Fixture creation calls uset1_threshold=37etc. All aligned with the DB model.Fix 3:
fully_evolvedlogicConfirmed. The line is now:
This reads the post-
max()value ofcurrent_tier, so a no-regression hold correctly setsfully_evolved=Truewhen the held tier is 4. Previously usingnew_tiercould producefully_evolved=Falsewhilecurrent_tier=4, a contradictory state.Findings
Correctness
StatsStub.kcolumn name in tests is intentionally the raw DB column name — it is read viar.kin the aggregation and mapped tostrikeoutson_CareerTotals. This is correct.Minor: Stale docstring in
_CareerTotalsThe class docstring on line 32 still reads
sp/rp: outs, kafter the rename to.strikeouts. This is documentation drift only — it does not affect runtime behaviour or test correctness. Does not warrant blocking merge.Security
valid_token) is applied consistently on all three router endpoints before any model access.Card.get_by_idexceptions are caught and surfaced as 404;ValueErrorfrom the evaluator is caught as 404. No unhandled exceptions that would leak internals.Style & Conventions
datetime.utcnow()deprecation is a pre-existing codebase pattern, correctly noted in the PR body as out of scope.Suggestions
sp/rp: outs, kin_CareerTotals.__doc__should be updated tosp/rp: outs, strikeoutsin a follow-up or during WP-09 integration. Low priority.Verdict: APPROVED
All three previously flagged bugs are correctly fixed. The evaluator logic is sound — tier computation, no-regression guarantee, and
fully_evolvedderivation all behave as specified. No new issues introduced by the fix commit.Automated review by Claude PR Reviewer
689ff4b70fto64b6225c41