test: refractor system Tier 3 test coverage #171
No reviewers
Labels
No Label
ai-changes-requested
ai-failed
ai-merged
ai-pr-opened
ai-reviewed
ai-reviewing
ai-reviewing
ai-working
bug
enhancement
evolution
performance
phase-0
phase-1a
phase-1b
phase-1c
phase-1d
security
tech-debt
todo
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cal/paper-dynasty-database#171
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "test/refractor-tier3"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Test plan
22 passed, 15 skipped— skips are existing PostgreSQL-only tests)Notable findings
fully_evolvedfrom tier — a corrupted flag is silently repaired on next evaluation🤖 Generated with Claude Code
Adds four Tier 3 (medium-priority) test cases to the existing refractor test suite. All tests use SQLite in-memory databases and run without a PostgreSQL connection. T3-1 (test_refractor_track_api.py): Two tests verifying that GET /api/v2/refractor/tracks?card_type= returns 200 with count=0 for both an unrecognised card_type value ('foo') and an empty string, rather than a 4xx/5xx. A full SQLite-backed TestClient is added to the track API test module for these cases. T3-6 (test_refractor_state_api.py): Verifies that GET /api/v2/refractor/cards/{card_id} returns last_evaluated_at: null (not a crash or missing key) when the RefractorCardState was initialised but never evaluated. Adds the SQLite test infrastructure (models, fixtures, helper factories, TestClient) to the state API test module. T3-7 (test_refractor_evaluator.py): Two tests covering fully_evolved/tier mismatch correction. When the database has fully_evolved=True but current_tier=3 (corruption), evaluate_card must re-derive fully_evolved from the freshly-computed tier (False for tier 3, True for tier 4). T3-8 (test_refractor_evaluator.py): Two tests confirming per-team stat isolation. A player with BattingSeasonStats on two different teams must have each team's RefractorCardState reflect only that team's stats — not a combined total. Covers both same-season and multi-season scenarios. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>AI Code Review
Files Reviewed
tests/test_refractor_evaluator.py(modified — T3-7, T3-8 added)tests/test_refractor_state_api.py(modified — T3-6 added)tests/test_refractor_track_api.py(modified — T3-1 added)Findings
Correctness
All 7 tests are correctly mapped to real implementation behaviour. Verified each against the source.
T3-7 (TestFullyEvolvedFlagCorrection): The evaluator at line 176 of
refractor_evaluator.pydoescard_state.fully_evolved = card_state.current_tier >= 4— it always recomputes from the final tier, never trusts the stored flag. Both tests correctly exploit this:test_fully_evolved_flag_corrected_when_tier_below_4: injectsfully_evolved=True, tier=3, suppliespa=500(value=500, T3 range), verifies the return dict AND the persisted row (viaCardStateStub.get_by_id(state.id)) are both corrected toFalse. The DB reload is a meaningful persistence check beyond just the return dict.test_fully_evolved_flag_preserved_when_tier_reaches_4: same corruption setup butpa=900(value=900 >= T4=896) legitimately reaches T4, sofully_evolved=Trueis the correct output. The pair cleanly separates "flag corrected" from "flag legitimately set."T3-8 (TestMultiTeamStatIsolation): The evaluator WHERE clause (evaluator.py line 111) filters
player_id AND team_id. Test values are well-chosen: the same-season test uses pa=80 and pa=120 where combined=200 crosses T2=149, making contamination detectable by both value and tier. The multi-season test uses per-team totals of 160/180 vs combined 340 — both teams land T2 either way, but the value assertions (160.0 vs 180.0) are what catch isolation bugs, and the docstring correctly calls this out.T3-1 (invalid card_type): The
list_tracksrouter (refractor.py lines 69-70) doesquery.where(RefractorTrack.card_type == card_type)whencard_type is not None. An empty string is not None, so it passes through as a literalWHERE card_type = ''— no rows match. The fixture correctly seeds two real tracks (batter, sp) before the test runs, so the result distinguishes "filter excluded them" from "table was empty."T3-6 (null last_evaluated_at): The
_build_card_state_responsehelper (refractor.py lines 49-51) usesstate.last_evaluated_at.isoformat() if state.last_evaluated_at else None. The test directly exercises theelse Nonebranch and asserts the key is present with a null value — the 200-vs-500-crash scenario is the real risk being guarded here.No math errors in threshold boundary calculations. All comparisons use
>=consistently with the stub and production_tier_from_valueimplementations.Security
No concerns. All changes are test-only. No credentials, user input surfaces, or new dependencies introduced.
Style & Conventions
Fully consistent with the established project patterns:
file:name?mode=memory&cache=shared) for API tests, matching the existing pattern._*_model/_*_fnoverrides for evaluator tests — identical to all existing evaluator tests._make_state/_make_statshelpers reused from existing module-level helpers.os.environ.setdefault("API_TOKEN", "test-token")correctly placed before theapp.db_engineimport in both new test sections;# noqa: E402carried where needed.autouse=Falseon the new SQLite fixtures is correct — prevents them from conflicting with the autousesetup_test_dbinconftest.py.Suggestions (non-blocking)
test_fully_evolved_flag_preserved_when_tier_reaches_4does not reload the DB row to confirm persistence offully_evolved=True, unlike its sibling which does. Ifcard_state.save()were omitted in the evaluator, the first test would catch it, but aget_by_idrecheck would make the pair symmetric.test_get_card_state_last_evaluated_at_nullcreates theRefractorCardStateinline rather than using the locally-defined_sa_make_statehelper. Both approaches are correct; the inline version is more explicit about thelast_evaluated_at=Nonefield being the meaningful distinction.Verdict: APPROVED
All 7 tests are correctly specified, correctly implement their stated scenarios, and directly exercise real implementation paths. Threshold math is accurate. The SQLite-backed API test infrastructure follows established project patterns. No correctness, security, or convention issues found. This is safe to merge.
Automated review by Claude PR Reviewer
Approved — 7 tests verified by pr-reviewer agent. Safe to merge.
AI Code Review
Files Reviewed
tests/test_refractor_evaluator.py(modified — +212 lines, 2 new test classes)tests/test_refractor_state_api.py(modified — SQLite boilerplate + T3-6 test)tests/test_refractor_track_api.py(modified — SQLite boilerplate + 2 T3-1 tests)Findings
Correctness
All value calculations verified manually against the stub formula and track thresholds:
pa=500→value=500→ T3 (448≤500<896) →fully_evolved=(3≥4)=False✓pa=900→value=900→ T4 (900≥896) →fully_evolved=(4≥4)=True✓pa=80→T1, team2pa=120→T1; combined 200→T2 — isolation confirmed ✓90+70=160→T2, team2100+80=180→T2; exact value assertions catch bleed ✓last_evaluated_at: nullin API response and that theNonebranch in_build_card_state_responsedoesn't crash ✓The
state_reloaded = CardStateStub.get_by_id(state.id)assertion in T3-7a is a good detail — it confirms the corrected flag was persisted, not just returned in the dict.No logical inconsistency between T2-1 (
fully_evolvedpersists when stats drop from T4) and T3-7 (corruptedfully_evolved=Trueat T3 is corrected) — both follow correctly from the evaluator always derivingfully_evolved = (post-regression tier ≥ 4).Security
No concerns — test code only, no production paths changed.
Style & Conventions
Follows the SQLite shared-cache pattern established in PR #170 (same
SqliteDatabase("file:...?mode=memory&cache=shared", uri=True), same db middleware, same fixture structure). Consistent with project conventions.Suggestions
_sa_make_stateis defined in the T3-6 boilerplate but the actual test (test_get_card_state_last_evaluated_at_null) callsRefractorCardState.create(...)directly instead of using the helper. Minor DRY inconsistency — non-blocking._state_api_db.close()/_track_api_db.close()in fixture teardown (same pattern as the T2 tests on main). Fine in practice with SQLite, consistent.# noqa: E402present on theapp.db_engineimport intest_refractor_track_api.pybut not intest_refractor_state_api.py. Ruff config likely handles it globally; PR reports no violations.Verdict: APPROVED
All 7 test cases are correct, mathematically verified, and well-documented with clear "What / Why" docstrings. Tests are isolated, use the established SQLite in-memory pattern, and cover real behavioral properties (flag corruption repair, multi-team stat isolation, null timestamp serialization). No blockers.
Automated review by Claude PR Reviewer
fa96e79428to906d6e575aNew commits pushed, approval review dismissed automatically according to repository settings
Re-approving after conflict resolution rebase — all 7 tests preserved, T3-6 merged into shared SQLite section. Logic unchanged.