{ "meta": { "version": "1.0.0", "created": "2026-02-27", "lastUpdated": "2026-02-27", "planType": "migration", "phase": "Phase 2: API Client + Sync + CSV Import", "description": "Port the data ingestion pipeline \u2014 HTTP API client, sync functions, and CSV card importer \u2014 from Python to Rust. This phase populates the database that Phase 1 created.", "totalEstimatedHours": 16, "totalTasks": 11, "completedTasks": 11 }, "categories": { "critical": "Must complete before sync or import can work", "high": "Required for data pipeline to function end-to-end", "medium": "Import functions needed to populate card data", "low": "Orchestration and polish" }, "tasks": [ { "id": "CRIT-001", "name": "Define API response types (serde structs)", "description": "Create typed response structs for all API endpoints. The league API returns JSON with nested objects and field names that differ from DB column names (e.g. 'sname' \u2192 'short_name', 'wara' \u2192 'swar'). Use serde rename attributes to handle the mismatches at deserialization time.", "category": "critical", "priority": 1, "completed": true, "tested": true, "dependencies": [], "files": [ { "path": "rust/src/api/client.rs", "lines": [ 1 ], "issue": "No response types defined" } ], "suggestedFix": "Create a new file `rust/src/api/types.rs` and add to `api/mod.rs`. Define structs:\n\n1. `TeamsResponse { count: u32, teams: Vec }`\n2. `TeamData` \u2014 flat fields plus nested `manager1: Option`, `manager2: Option`, `division: Option`. Use `#[serde(rename = \"sname\")]` for `short_name`, `#[serde(rename = \"lname\")]` for `long_name`, `#[serde(rename = \"gmid\")]` for `gm_discord_id`, `#[serde(rename = \"gmid2\")]` for `gm2_discord_id`. Manager struct: `{ name: Option }`. Division struct: `{ id: Option, division_name: Option, league_abbrev: Option }`.\n3. `PlayersResponse { count: u32, players: Vec }`\n4. `PlayerData` \u2014 use `#[serde(rename = \"wara\")]` for `swar`, `#[serde(rename = \"sbaplayer\")]` for `sbaplayer_id`, `#[serde(rename = \"image\")]` for `card_image`, `#[serde(rename = \"image2\")]` for `card_image_alt`. Nested `team: Option` where `TeamRef { id: i64 }`.\n5. `TransactionsResponse { count: u32, transactions: Vec }`\n6. `TransactionData` \u2014 `#[serde(rename = \"moveid\")]` for `move_id`. Nested `player: Option`, `oldteam: Option`, `newteam: Option`.\n7. `CurrentResponse { season: i64, week: i64 }` (for get_current endpoint).\n\nAll structs derive `Debug, Deserialize`. Use `Option<>` liberally \u2014 API responses have many nullable fields. Use `#[serde(default)]` on optional fields to handle missing keys gracefully.", "estimatedHours": 1.5, "notes": "Key gotchas: gmid/gmid2 come as integers from the API but are stored as String in DB \u2014 deserialize as Option then .map(|id| id.to_string()). The 'team' field on PlayerData is an object {id: N} not a bare integer." }, { "id": "CRIT-002", "name": "Define API error type", "description": "Create a proper error enum for API operations using thiserror. The Python client has special handling for Cloudflare 403 errors and distinguishes HTTP errors from network errors.", "category": "critical", "priority": 2, "completed": true, "tested": true, "dependencies": [], "files": [ { "path": "rust/src/api/client.rs", "lines": [ 1 ], "issue": "Uses anyhow::Result with no typed errors" } ], "suggestedFix": "Add to `api/client.rs` or a new `api/error.rs`:\n\n```rust\n#[derive(Debug, thiserror::Error)]\npub enum ApiError {\n #[error(\"HTTP {status}: {message}\")]\n Http { status: u16, message: String },\n\n #[error(\"Cloudflare blocked the request (403). The API may be down or rate-limiting.\")]\n CloudflareBlocked,\n\n #[error(\"Request failed: {0}\")]\n Request(#[from] reqwest::Error),\n\n #[error(\"JSON parse error: {0}\")]\n Parse(#[from] serde_json::Error),\n}\n```\n\nIn the request method: if status == 403 and body contains 'cloudflare' (case-insensitive), return CloudflareBlocked. Otherwise return Http variant.", "estimatedHours": 0.5, "notes": "Keep it simple. The Python error has status_code as an optional field \u2014 in Rust we can use separate variants instead." }, { "id": "CRIT-003", "name": "Implement core _request method and auth", "description": "Add the internal request method to LeagueApiClient that handles authentication (Bearer token), base URL prefixing (/api/v3), response status checking, Cloudflare detection, and JSON deserialization. All public endpoint methods will delegate to this.", "category": "critical", "priority": 3, "completed": true, "tested": true, "dependencies": [ "CRIT-001", "CRIT-002" ], "files": [ { "path": "rust/src/api/client.rs", "lines": [ 11, 23 ], "issue": "Only has new() constructor, no request logic" } ], "suggestedFix": "Add a private async method:\n\n```rust\nasync fn get(&self, path: &str, params: &[(& str, String)]) -> Result\n```\n\n1. Build URL: `format!(\"{}/api/v3{}\", self.base_url, path)`\n2. Create request with `self.client.get(url).query(params)`\n3. If `self.api_key` is not empty, add `.bearer_auth(&self.api_key)`\n4. Send and get response\n5. If `!response.status().is_success()`: read body text, check for cloudflare 403, else return Http error\n6. Deserialize JSON body to T\n\nNote: reqwest's .query() handles repeated params correctly when given a slice of tuples \u2014 e.g., `&[(\"team_id\", \"1\"), (\"team_id\", \"2\")]` produces `?team_id=1&team_id=2`.", "estimatedHours": 1, "notes": "The Python client uses httpx.AsyncClient as a context manager. In Rust, reqwest::Client is Clone + Send and can be reused freely \u2014 no context manager needed. Consider making api_key an Option to properly represent 'no auth' vs 'empty string'." }, { "id": "HIGH-001", "name": "Implement all API endpoint methods", "description": "Port all 10 public API methods from the Python client. Each method builds the correct path and query params, then delegates to the core _request/get method.", "category": "high", "priority": 4, "completed": true, "tested": true, "dependencies": [ "CRIT-003" ], "files": [ { "path": "rust/src/api/client.rs", "lines": [ 11, 23 ], "issue": "No endpoint methods" } ], "suggestedFix": "Implement these methods on LeagueApiClient:\n\n1. `get_teams(season, team_abbrev, active_only, short_output)` \u2192 GET /teams\n2. `get_team(team_id)` \u2192 GET /teams/{id}\n3. `get_team_roster(team_id, which)` \u2192 GET /teams/{id}/roster/{which}\n4. `get_players(season, team_id, pos, name, short_output)` \u2192 GET /players\n5. `get_player(player_id, short_output)` \u2192 GET /players/{id}\n6. `search_players(query, season, limit)` \u2192 GET /players/search\n7. `get_transactions(season, week_start, week_end, team_abbrev, cancelled, frozen, short_output)` \u2192 GET /transactions\n8. `get_current()` \u2192 GET /current\n9. `get_schedule(season, week, team_id)` \u2192 GET /schedules\n10. `get_standings(season)` \u2192 GET /standings\n\nFor optional params, use `Option` in the method signature and only add the param to the query slice when Some. For array params (team_id, team_abbrev, pos), accept `Option<&[T]>` and emit one tuple per element.\n\nReturn the typed response structs from CRIT-001.", "estimatedHours": 2.5, "notes": "get_players and get_transactions have the most params. Consider a builder pattern or param struct if the signatures get unwieldy, but plain method args are fine for now \u2014 matches the Python style." }, { "id": "HIGH-002", "name": "Implement sync_teams", "description": "Port the team sync function. Fetches teams from API and upserts into the database. Must handle the JSON field name mismatches (sname\u2192short_name, lname\u2192long_name, gmid\u2192gm_discord_id, nested manager/division objects).", "category": "high", "priority": 5, "completed": true, "tested": true, "dependencies": [ "HIGH-001" ], "files": [ { "path": "rust/src/api/mod.rs", "lines": [ 1 ], "issue": "No sync module" } ], "suggestedFix": "Create `rust/src/api/sync.rs` and add `pub mod sync;` to api/mod.rs.\n\nImplement `pub async fn sync_teams(pool: &SqlitePool, season: i64, client: &LeagueApiClient) -> Result`:\n\n1. Call `client.get_teams(Some(season), None, true, false)`\n2. Iterate response.teams\n3. For each TeamData: use `INSERT OR REPLACE INTO teams (id, abbrev, short_name, long_name, season, ...) VALUES (?, ?, ?, ...)`. Map: `data.short_name` (already renamed by serde), flatten `data.manager1.map(|m| m.name)`, `data.division.map(|d| d.id)`, `data.gm_discord_id.map(|id| id.to_string())`, set `synced_at = Utc::now().naive_utc()`.\n4. Call `update_sync_status(pool, \"teams\", count, None)` from db::queries.\n5. Return count.\n\nUse a transaction (pool.begin()) to batch all inserts \u2014 faster and atomic.", "estimatedHours": 1.5, "notes": "The Python code does individual get-then-update/insert. The Rust version should use INSERT OR REPLACE for simplicity (matching Phase 1 upsert pattern). The season field is set on insert only in Python \u2014 in Rust, just always set it since we're using REPLACE." }, { "id": "HIGH-003", "name": "Implement sync_players", "description": "Port the player sync function. Similar to team sync but with different field mappings (wara\u2192swar, sbaplayer\u2192sbaplayer_id, nested team.id).", "category": "high", "priority": 6, "completed": true, "tested": true, "dependencies": [ "HIGH-001" ], "files": [ { "path": "rust/src/api/sync.rs", "lines": [], "issue": "File doesn't exist yet" } ], "suggestedFix": "Implement `pub async fn sync_players(pool: &SqlitePool, season: i64, team_id: Option, client: &LeagueApiClient) -> Result`:\n\n1. Call `client.get_players(Some(season), team_id.map(|id| vec![id]).as_deref(), None, None, false)`\n2. Iterate response.players\n3. INSERT OR REPLACE with field mapping: `data.swar` (renamed by serde from wara), `data.team.map(|t| t.id)` for team_id, `data.sbaplayer_id` (renamed from sbaplayer), pos_1 through pos_8, etc.\n4. Important: do NOT overwrite `hand` field \u2014 it comes from CSV import only. Use `INSERT OR REPLACE` but if you need to preserve hand, use `INSERT ... ON CONFLICT(id) DO UPDATE SET name=excluded.name, ... ` and omit `hand` from the SET clause.\n5. Call `update_sync_status(pool, \"players\", count, None)`.\n6. Return count.\n\nUse transaction for batching.", "estimatedHours": 1.5, "notes": "Critical: player.hand must NOT be overwritten by API sync (it's CSV-only). Use ON CONFLICT DO UPDATE with explicit column list instead of INSERT OR REPLACE, which would null out hand. This is different from the team sync pattern." }, { "id": "HIGH-004", "name": "Implement sync_transactions", "description": "Port the transaction sync function. Transactions use a composite key (move_id, player_id) for dedup, not a simple PK. Must skip rows with missing moveid or player.id.", "category": "high", "priority": 7, "completed": true, "tested": true, "dependencies": [ "HIGH-001" ], "files": [ { "path": "rust/src/api/sync.rs", "lines": [], "issue": "File doesn't exist yet" } ], "suggestedFix": "Implement `pub async fn sync_transactions(pool: &SqlitePool, season: i64, week_start: i64, week_end: Option, team_abbrev: Option<&str>, client: &LeagueApiClient) -> Result`:\n\n1. Call `client.get_transactions(season, week_start, week_end, team_abbrev, false, false, false)`\n2. Iterate response.transactions\n3. Skip if moveid is None or player is None/player.id is None\n4. Use `INSERT INTO transactions (...) VALUES (...) ON CONFLICT(move_id, player_id) DO UPDATE SET cancelled=excluded.cancelled, frozen=excluded.frozen, synced_at=excluded.synced_at` \u2014 only update the mutable fields on conflict.\n5. Map: `data.move_id` (renamed from moveid), `data.player.unwrap().id`, `data.oldteam.map(|t| t.id)`, `data.newteam.map(|t| t.id)`, set season and week.\n6. Call `update_sync_status(pool, \"transactions\", count, None)`.\n7. Return count.\n\nUse transaction for batching.", "estimatedHours": 1, "notes": "The transactions table has UNIQUE(move_id, player_id). ON CONFLICT on that unique constraint is the right approach. Python does a SELECT-then-INSERT/UPDATE which is racy." }, { "id": "HIGH-005", "name": "Implement sync_all orchestrator", "description": "Port sync_all which creates one API client and runs all three sync functions sequentially, returning a summary of counts.", "category": "high", "priority": 8, "completed": true, "tested": true, "dependencies": [ "HIGH-002", "HIGH-003", "HIGH-004" ], "files": [ { "path": "rust/src/api/sync.rs", "lines": [], "issue": "File doesn't exist yet" } ], "suggestedFix": "Implement `pub async fn sync_all(pool: &SqlitePool, settings: &Settings) -> Result`:\n\n1. Create `LeagueApiClient::new(&settings.api.base_url, &settings.api.api_key, settings.api.timeout)?`\n2. Call sync_teams, sync_players, sync_transactions sequentially\n3. For transactions, use week_start=1, week_end=None (sync all weeks)\n4. Return a SyncResult struct: `{ teams: i64, players: i64, transactions: i64 }`\n\nDefine `SyncResult` in sync.rs or types.rs.", "estimatedHours": 0.5, "notes": "Keep it simple. The Python version passes the client to each sync function \u2014 do the same. Consider logging each step's count." }, { "id": "MED-001", "name": "Implement CSV helper functions (parse_float, parse_int, parse_endurance)", "description": "Port the three CSV parsing helpers. parse_int must handle '5.0' \u2192 5 (parse as float then truncate). parse_endurance uses regex to extract S(n), R(n), C(n) from endurance strings like 'S(5) R(4)' or 'R(1) C(6)*'.", "category": "medium", "priority": 9, "completed": true, "tested": true, "dependencies": [], "files": [], "suggestedFix": "Create `rust/src/api/importer.rs` and add `pub mod importer;` to api/mod.rs.\n\nImplement:\n\n1. `fn parse_float(value: &str, default: f64) -> f64` \u2014 trim, return default if empty, parse f64 or return default.\n\n2. `fn parse_int(value: &str, default: i32) -> i32` \u2014 trim, return default if empty, parse as f64 first then truncate to i32. This handles '5.0' \u2192 5.\n\n3. `fn parse_endurance(value: &str) -> (Option, Option, Option)` \u2014 use lazy_static or once_cell for compiled regexes:\n - `S\\((\\d+)\\*?\\)` \u2192 start\n - `R\\((\\d+)\\)` \u2192 relief\n - `C\\((\\d+)\\)` \u2192 close\n Return (start, relief, close) where any component may be None.\n\nWrite unit tests for all three, especially edge cases: empty string, whitespace, '5.0', 'S(5*) R(4)', 'C(6)', unparseable values.", "estimatedHours": 1, "notes": "Use `std::sync::LazyLock` (stable since Rust 1.80) for compiled regexes instead of lazy_static or once_cell. Keep these as module-level functions, not methods." }, { "id": "MED-002", "name": "Implement import_batter_cards", "description": "Port the batter CSV importer. Reads BatterCalcs.csv, maps columns to BatterCard fields, upserts into database. Also updates player.hand from CSV. Returns (imported, skipped, errors) counts.", "category": "medium", "priority": 10, "completed": true, "tested": true, "dependencies": [ "MED-001" ], "files": [ { "path": "rust/src/api/importer.rs", "lines": [], "issue": "File doesn't exist yet" } ], "suggestedFix": "Implement `pub async fn import_batter_cards(pool: &SqlitePool, csv_path: &Path, update_existing: bool) -> Result`:\n\nDefine `ImportResult { imported: i64, skipped: i64, errors: Vec }`.\n\n1. Open CSV with `csv::ReaderBuilder::new().has_headers(true).from_path(csv_path)`\n2. Get headers, iterate records via `reader.records()`\n3. For each record:\n a. Parse player_id from column 'player_id' \u2014 skip if 0\n b. Look up player in DB \u2014 if not found, push error string and continue\n c. Update player.hand if column 'hand' is L/R/S: `UPDATE players SET hand = ? WHERE id = ?`\n d. If !update_existing, check if batter_card exists \u2014 if so, increment skipped and continue\n e. INSERT OR REPLACE batter card with all field mappings (see column map)\n f. Increment imported\n4. Catch per-row errors (wrap in match) \u2192 push to errors vec\n\nColumn mappings (CSV header \u2192 parse function \u2192 DB field):\n- 'SO vlhp' \u2192 parse_float \u2192 so_vlhp\n- 'BB v lhp' \u2192 parse_float \u2192 bb_vlhp\n- 'HIT v lhp' \u2192 parse_float \u2192 hit_vlhp\n- (etc. \u2014 19 stat columns per side, plus stealing, STL, SPD, B, H, FIELDING, cArm/CA, vL, vR, Total)\n- Source set to csv filename\n\nUse a helper fn or macro to avoid repeating `get_column(headers, record, 'name')` 40+ times. Consider a HashMap for header\u2192index lookup.", "estimatedHours": 2.5, "notes": "The catcher_arm column has two possible CSV header names: 'cArm' OR 'CA'. Check both. SPD defaults to 10 if missing. Use a transaction to batch all inserts." }, { "id": "MED-003", "name": "Implement import_pitcher_cards and import_all_cards", "description": "Port the pitcher CSV importer (similar structure to batter but with endurance parsing, different column names, and pitcher-specific fields). Also port import_all_cards orchestrator that runs both imports and defaults CSV paths.", "category": "medium", "priority": 11, "completed": true, "tested": true, "dependencies": [ "MED-002" ], "files": [ { "path": "rust/src/api/importer.rs", "lines": [], "issue": "File doesn't exist yet" } ], "suggestedFix": "Implement `pub async fn import_pitcher_cards(pool: &SqlitePool, csv_path: &Path, update_existing: bool) -> Result`:\n\nSame structure as batter import with key differences:\n1. player.hand validated as L/R only (no S for pitchers)\n2. CSV 'vlhp' columns map to DB 'vlhb' fields (vs Left-Handed Batters)\n3. Endurance parsing: try columns 'ENDURANCE' or 'cleanEndur' first with parse_endurance(), then override with individual 'SP'/'RP'/'CP' columns if present (parse_int)\n4. Additional fields: hold_rating ('HO'), fielding_range ('Range'), fielding_error ('Error'), wild_pitch ('WP'), balk ('BK'), batting_rating ('BAT-B')\n5. Watch for 'BP1b v rhp' \u2014 lowercase 'b' in CSV header\n6. 'BPHR vL' and 'BPHR vR' (not 'BPHR v lhp'/'BPHR v rhp') for pitcher ballpark HR columns\n\nImplement `pub async fn import_all_cards(pool: &SqlitePool, batter_csv: Option<&Path>, pitcher_csv: Option<&Path>, update_existing: bool) -> Result`:\n1. Default paths: docs/sheets_export/BatterCalcs.csv and PitcherCalcs.csv\n2. Run batter import, then pitcher import\n3. Return combined results\n4. Note: Python version calls rebuild_score_cache() after import \u2014 defer this to Phase 3 (calc layer) and add a TODO comment.", "estimatedHours": 2.5, "notes": "The pitcher importer has more gotchas than the batter importer. The endurance column fallback logic (ENDURANCE \u2192 cleanEndur \u2192 individual SP/RP/CP) must be implemented carefully. Extract shared CSV reading logic (header lookup, record iteration, error collection) into helper functions to avoid duplication with the batter importer." } ], "quickWins": [ { "taskId": "CRIT-002", "estimatedMinutes": 20, "impact": "Clean error types make all other API code easier to write" }, { "taskId": "MED-001", "estimatedMinutes": 30, "impact": "Pure functions with no dependencies \u2014 can be implemented and tested immediately" } ], "productionBlockers": [ { "taskId": "CRIT-001", "reason": "No response types = can't deserialize any API responses" }, { "taskId": "CRIT-003", "reason": "No request method = can't call any endpoints" } ], "weeklyRoadmap": { "session1": { "theme": "API Client Foundation", "tasks": [ "CRIT-001", "CRIT-002", "CRIT-003", "HIGH-001" ], "estimatedHours": 5.5, "notes": "Get the HTTP client fully working with typed responses. Test against the live API with cargo run." }, "session2": { "theme": "Sync Pipeline", "tasks": [ "HIGH-002", "HIGH-003", "HIGH-004", "HIGH-005" ], "estimatedHours": 4.5, "notes": "All three sync functions plus the orchestrator. Test by syncing real data and querying with Phase 1 query functions." }, "session3": { "theme": "CSV Import Pipeline", "tasks": [ "MED-001", "MED-002", "MED-003" ], "estimatedHours": 6, "notes": "Parse helpers, batter importer, pitcher importer. Test against existing BatterCalcs.csv and PitcherCalcs.csv files." } }, "architecturalDecisions": { "serde_rename_for_field_mapping": "Use #[serde(rename = \"...\")] on API response structs to handle JSON\u2194Rust name mismatches at deserialization time, not at the sync layer. This keeps sync functions clean.", "player_hand_preservation": "sync_players must use ON CONFLICT DO UPDATE with explicit column list (omitting hand) instead of INSERT OR REPLACE, which would null out hand. Hand is populated only by CSV import.", "transaction_batching": "Wrap all sync and import operations in sqlx transactions for atomicity and performance. SQLite is much faster with batched inserts inside a transaction.", "error_collection_not_abort": "CSV import collects per-row errors into a Vec and continues processing remaining rows, matching the Python behavior. Only truly fatal errors (file not found, DB connection lost) abort the entire import.", "csv_header_lookup": "Build a HashMap from CSV headers for O(1) column lookup by name. This handles the variant column names (cArm vs CA, ENDURANCE vs cleanEndur) cleanly.", "no_score_cache_rebuild_yet": "import_all_cards will NOT call rebuild_score_cache() \u2014 that belongs to Phase 3 (calc layer). Add a TODO comment at the call site." }, "testingStrategy": { "api_client": "Write integration tests that call the live API (behind a #[cfg(feature = \"integration\")] gate or #[ignore] attribute). Test at minimum: get_teams, get_players, get_current. Verify serde deserialization works with real API responses.", "sync_functions": "Test with an in-memory SQLite database. Call sync, then verify rows were inserted using Phase 1 query functions. Mock the API client if feasible, otherwise use #[ignore] for live tests.", "csv_helpers": "Full unit test coverage for parse_float, parse_int, parse_endurance with edge cases.", "csv_import": "Create small test CSV files (3-5 rows) with known values. Import into in-memory DB, then query and assert field values match. Test both insert and update paths." } }