fix: derive SEASON_PCT from date range instead of hardcoding half-season (#9) #36

Merged
cal merged 1 commits from ai/paper-dynasty-card-creation#9 into main 2026-03-23 12:40:48 +00:00
Collaborator

Closes #9

What changed

SEASON_PCT was hardcoded to 81 / 162 (~0.5) while END_DATE was set to 20050731 (~65% through the season). Running retrosheet_data.py directly — without the pd-cards CLI, which overrides SEASON_PCT at runtime via rd.SEASON_PCT = season_pct — would silently generate cards with half-season stat normalizations on a 65%-season dataset.

Fix

  • Moved START_DATE and END_DATE declarations before SEASON_PCT so the derivation can reference them
  • Added SEASON_END_DATE = 20051002 (2005 MLB regular season end) as the denominator anchor
  • SEASON_PCT is now derived: (END_DATE - START_DATE).days / (SEASON_END_DATE - START_DATE).days, capped at 1.0
  • With the existing defaults (START_DATE=20050403, END_DATE=20050731), SEASON_PCT evaluates to ~0.653 instead of 0.500

Before: Changing END_DATE silently left SEASON_PCT stale.
After: Changing END_DATE automatically produces the correct SEASON_PCT.

Files changed

  • retrosheet_data.py — config block only (lines 64–79)

Other observations

  • The auto-formatter (ruff) also updated several f-string quote styles elsewhere in the file (functionally identical changes). These were applied automatically on save and are included in this commit.
  • Issue #18 (retrosheet_data.py:1696-1699) — commented-out promo safety guard — is still present and unaddressed; tracked separately.
  • Issue #19 (LAST_WEEK_RATIO dead ternary) is also still present; tracked separately.
Closes #9 ## What changed `SEASON_PCT` was hardcoded to `81 / 162` (~0.5) while `END_DATE` was set to `20050731` (~65% through the season). Running `retrosheet_data.py` directly — without the `pd-cards` CLI, which overrides `SEASON_PCT` at runtime via `rd.SEASON_PCT = season_pct` — would silently generate cards with half-season stat normalizations on a 65%-season dataset. ## Fix - Moved `START_DATE` and `END_DATE` declarations before `SEASON_PCT` so the derivation can reference them - Added `SEASON_END_DATE = 20051002` (2005 MLB regular season end) as the denominator anchor - `SEASON_PCT` is now derived: `(END_DATE - START_DATE).days / (SEASON_END_DATE - START_DATE).days`, capped at 1.0 - With the existing defaults (`START_DATE=20050403`, `END_DATE=20050731`), `SEASON_PCT` evaluates to `~0.653` instead of `0.500` **Before:** Changing `END_DATE` silently left `SEASON_PCT` stale. **After:** Changing `END_DATE` automatically produces the correct `SEASON_PCT`. ## Files changed - `retrosheet_data.py` — config block only (lines 64–79) ## Other observations - The auto-formatter (ruff) also updated several f-string quote styles elsewhere in the file (functionally identical changes). These were applied automatically on save and are included in this commit. - Issue #18 (`retrosheet_data.py:1696-1699`) — commented-out promo safety guard — is still present and unaddressed; tracked separately. - Issue #19 (`LAST_WEEK_RATIO` dead ternary) is also still present; tracked separately.
Claude added 1 commit 2026-03-21 01:35:24 +00:00
Closes #9

Previously SEASON_PCT was hardcoded to 81/162 (~0.5) while END_DATE was
set to 20050731 (~65% through the season). Running retrosheet_data.py
directly (without the CLI which overrides SEASON_PCT at runtime) would
silently generate cards using half-season normalizations on stats covering
a larger portion of the season.

Fix: move START_DATE/END_DATE before SEASON_PCT and derive SEASON_PCT
from the date range using SEASON_END_DATE (2005 regular season end).
Now changing END_DATE automatically produces the correct SEASON_PCT.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude added the
ai-reviewing
label 2026-03-21 01:47:01 +00:00
Claude reviewed 2026-03-21 01:48:07 +00:00
Claude left a comment
Author
Collaborator

AI Code Review

Files Reviewed

  • retrosheet_data.py (modified)

Findings

Correctness

  • The derived SEASON_PCT formula is mathematically correct. With defaults START_DATE=20050403, END_DATE=20050731, SEASON_END_DATE=20051002: numerator = 119 days, denominator = 182 days → 0.654, consistent with the claimed ~0.653.
  • datetime is already imported at line 2 — no missing dependency.
  • The min(..., 1.0) cap correctly handles any END_DATE > SEASON_END_DATE edge case.
  • START_DATE and END_DATE were already declared before SEASON_PCT in the original file (the diff confirms they were in place, just reordered within the block relative to the removed line). The derivation is correct regardless.
  • Only division-by-zero risk is SEASON_END_DATE == START_DATE, which is physically impossible in practice and acceptable in a hand-edited config file.

Security

  • No issues. Config-block change only, no external input involved.

Style & Conventions

  • The f-string quote-style updates (single → double outer, single inner) are ruff-formatted, functionally identical, and consistent with modern Python conventions. No issues.
  • Pre-existing p_name key in new_player_payload (CLAUDE.md notes the API expects name, not p_name) is out of scope for this PR and should be tracked separately.

Suggestions

  • SEASON_END_DATE is hardcoded to the 2005 regular season end. This is fine given the file is explicitly 2005-targeted throughout, but a comment noting it should be updated for other seasons would help future maintainers (non-blocking).

Verdict: APPROVED

Clean fix for a real silent-failure bug. The derived formula is correct, the datetime module is available, and the ruff-formatted f-strings are safe cosmetic changes. No blocking issues.

Note: Gitea prevented posting this as a formal APPROVED review (self-approval not permitted). This COMMENT represents a passing review — the change is ready to merge.


Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `retrosheet_data.py` (modified) ### Findings #### Correctness - The derived `SEASON_PCT` formula is mathematically correct. With defaults `START_DATE=20050403`, `END_DATE=20050731`, `SEASON_END_DATE=20051002`: numerator = 119 days, denominator = 182 days → 0.654, consistent with the claimed ~0.653. - `datetime` is already imported at line 2 — no missing dependency. - The `min(..., 1.0)` cap correctly handles any `END_DATE > SEASON_END_DATE` edge case. - `START_DATE` and `END_DATE` were already declared before `SEASON_PCT` in the original file (the diff confirms they were in place, just reordered within the block relative to the removed line). The derivation is correct regardless. - Only division-by-zero risk is `SEASON_END_DATE == START_DATE`, which is physically impossible in practice and acceptable in a hand-edited config file. #### Security - No issues. Config-block change only, no external input involved. #### Style & Conventions - The f-string quote-style updates (single → double outer, single inner) are ruff-formatted, functionally identical, and consistent with modern Python conventions. No issues. - Pre-existing `p_name` key in `new_player_payload` (CLAUDE.md notes the API expects `name`, not `p_name`) is out of scope for this PR and should be tracked separately. #### Suggestions - `SEASON_END_DATE` is hardcoded to the 2005 regular season end. This is fine given the file is explicitly 2005-targeted throughout, but a comment noting it should be updated for other seasons would help future maintainers (non-blocking). ### Verdict: APPROVED Clean fix for a real silent-failure bug. The derived formula is correct, the `datetime` module is available, and the ruff-formatted f-strings are safe cosmetic changes. No blocking issues. > Note: Gitea prevented posting this as a formal APPROVED review (self-approval not permitted). This COMMENT represents a passing review — the change is ready to merge. --- *Automated review by Claude PR Reviewer*
Claude added
ai-reviewed
and removed
ai-reviewing
labels 2026-03-21 01:48:12 +00:00
cal approved these changes 2026-03-23 12:08:03 +00:00
Dismissed
cal left a comment
Owner

AI Code Review

Files Reviewed

  • retrosheet_data.py (modified)

Findings

Correctness

  • The core fix is mathematically sound. With the defaults (START_DATE=20050403, END_DATE=20050731, SEASON_END_DATE=20051002), the derivation yields 119 / 182 ≈ 0.654, which matches the PR description's claimed ~0.653. (Minor rounding difference — both are correct.)
  • datetime is already imported at the top of the file; the strptime calls in the new SEASON_PCT derivation are safe.
  • min(..., 1.0) cap is correct defensive programming — prevents values slightly over 1.0 if END_DATE is set past SEASON_END_DATE.
  • The reordering of START_DATE / END_DATE before SEASON_PCT is required and correct; no forward-reference issues.
  • The f-string quote style changes (single-to-double quotes inside f-strings) are functionally identical and consistent with ruff's formatting rules. No behavioral changes.

Minor Observation — Inconsistency with existing helper

The file already has a date_from_int(integer_date: int) helper that does the same conversion as datetime.datetime.strptime(str(END_DATE), "%Y%m%d"). The PR uses strptime instead, which is a small inconsistency. Not a bug — both produce the same result — but using date_from_int would be more in keeping with the existing codebase style. This does not block merging.

Security

  • No issues found.

Style & Conventions

  • No TODOs, FIXMEs, or stub code introduced.
  • The PR description accurately and completely describes all changes, including the ruff formatting side-effects and the two unrelated open issues (#18, #19) that are intentionally out of scope.
  • CLAUDE.md's configuration checklist still lists SEASON_PCT as a manual config value (Percentage of season completed (162/162 for full season)). This is now mildly misleading since the value is derived automatically. Not a blocker, but a docs update would be worth doing in a follow-up.

Suggestions

  • Consider updating the CLAUDE.md configuration checklist entry for SEASON_PCT to reflect that it is now auto-derived from START_DATE, END_DATE, and SEASON_END_DATE. Operators should be told to set END_DATE rather than SEASON_PCT directly.
  • The date_from_int helper could replace the two strptime calls in the derivation block for consistency, but this is purely a style preference.

Verdict: APPROVED

The fix is correct, complete, and solves the stated problem cleanly. The silent stale-SEASON_PCT bug is eliminated without any behavioral risk — the math produces the right value for existing defaults and will remain correct as END_DATE is updated in future runs. The ruff formatting changes are noise-free. No blockers.


Automated review by Claude PR Reviewer

## AI Code Review ### Files Reviewed - `retrosheet_data.py` (modified) ### Findings #### Correctness - The core fix is mathematically sound. With the defaults (`START_DATE=20050403`, `END_DATE=20050731`, `SEASON_END_DATE=20051002`), the derivation yields `119 / 182 ≈ 0.654`, which matches the PR description's claimed `~0.653`. (Minor rounding difference — both are correct.) - `datetime` is already imported at the top of the file; the `strptime` calls in the new `SEASON_PCT` derivation are safe. - `min(..., 1.0)` cap is correct defensive programming — prevents values slightly over 1.0 if `END_DATE` is set past `SEASON_END_DATE`. - The reordering of `START_DATE` / `END_DATE` before `SEASON_PCT` is required and correct; no forward-reference issues. - The f-string quote style changes (single-to-double quotes inside f-strings) are functionally identical and consistent with ruff's formatting rules. No behavioral changes. #### Minor Observation — Inconsistency with existing helper The file already has a `date_from_int(integer_date: int)` helper that does the same conversion as `datetime.datetime.strptime(str(END_DATE), "%Y%m%d")`. The PR uses `strptime` instead, which is a small inconsistency. Not a bug — both produce the same result — but using `date_from_int` would be more in keeping with the existing codebase style. This does not block merging. #### Security - No issues found. #### Style & Conventions - No TODOs, FIXMEs, or stub code introduced. - The PR description accurately and completely describes all changes, including the ruff formatting side-effects and the two unrelated open issues (#18, #19) that are intentionally out of scope. - CLAUDE.md's configuration checklist still lists `SEASON_PCT` as a manual config value (`Percentage of season completed (162/162 for full season)`). This is now mildly misleading since the value is derived automatically. Not a blocker, but a docs update would be worth doing in a follow-up. #### Suggestions - Consider updating the CLAUDE.md configuration checklist entry for `SEASON_PCT` to reflect that it is now auto-derived from `START_DATE`, `END_DATE`, and `SEASON_END_DATE`. Operators should be told to set `END_DATE` rather than `SEASON_PCT` directly. - The `date_from_int` helper could replace the two `strptime` calls in the derivation block for consistency, but this is purely a style preference. ### Verdict: APPROVED The fix is correct, complete, and solves the stated problem cleanly. The silent stale-`SEASON_PCT` bug is eliminated without any behavioral risk — the math produces the right value for existing defaults and will remain correct as `END_DATE` is updated in future runs. The ruff formatting changes are noise-free. No blockers. --- *Automated review by Claude PR Reviewer*
cal approved these changes 2026-03-23 12:38:05 +00:00
cal left a comment
Owner

Approved. Deriving SEASON_PCT from the actual date range rather than a hardcoded value is the correct approach — makes it accurate for any run date.

Approved. Deriving SEASON_PCT from the actual date range rather than a hardcoded value is the correct approach — makes it accurate for any run date.
cal force-pushed ai/paper-dynasty-card-creation#9 from c568b6a024 to 63a30bd434 2026-03-23 12:40:43 +00:00 Compare
cal merged commit 7286fd2203 into main 2026-03-23 12:40:48 +00:00
cal deleted branch ai/paper-dynasty-card-creation#9 2026-03-23 12:40:48 +00:00
Sign in to join this conversation.
No reviewers
cal
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/paper-dynasty-card-creation#36
No description provided.