sba-scouting/rust/PHASE1_PROJECT_PLAN.json

{
  "meta": {
    "version": "1.0.0",
    "created": "2026-02-26",
    "lastUpdated": "2026-02-26",
    "planType": "migration",
    "phase": "Phase 1: Foundation (DB + Config + Schema)",
    "description": "Wire up the data pipeline foundation for the SBA Scout Rust TUI rewrite. Database schema creation, full query layer, config integration, and dependency additions.",
    "totalEstimatedHours": 18,
    "totalTasks": 12,
    "completedTasks": 12
  },
  "categories": {
    "critical": "Must complete before any other phase can start",
    "high": "Required for data pipeline to function",
    "medium": "Query functions needed by screens",
    "low": "Dependency prep for later phases"
  },
  "tasks": [
    {
      "id": "CRIT-001",
      "name": "Add SQL migration file for all 9 tables",
      "description": "Create a sqlx migration (or embedded SQL) that defines CREATE TABLE statements for all 9 tables matching the Python SQLAlchemy models exactly. Must include all columns, types, defaults, foreign keys, and unique constraints. sqlx does not have ORM-style create_all — tables must be defined as raw SQL.",
      "category": "critical",
      "priority": 1,
      "completed": true,
      "tested": true,
      "dependencies": [],
      "files": [
        {
          "path": "rust/src/db/schema.rs",
          "lines": [6],
          "issue": "Pool init exists but no table creation logic"
        },
        {
          "path": "src/sba_scout/db/models.py",
          "lines": [39, 83, 181, 246, 318, 351, 383, 413, 459],
          "issue": "Python reference — all 9 model classes with exact column specs"
        }
      ],
      "suggestedFix": "1. Add a `create_tables` async fn in schema.rs that runs raw SQL via `sqlx::query()`. 2. Define CREATE TABLE IF NOT EXISTS for: teams (with UNIQUE(abbrev, season)), players (FK teams.id), batter_cards (FK players.id, UNIQUE player_id), pitcher_cards (FK players.id, UNIQUE player_id), transactions (UNIQUE(move_id, player_id)), lineups (FK players.id for starting_pitcher_id), matchup_cache (UNIQUE(batter_id, pitcher_id)), standardized_score_cache (UNIQUE(batter_card_id, split), UNIQUE(pitcher_card_id, split)), sync_status (UNIQUE entity_type). 3. Call create_tables from main.rs after pool init. Column types: INTEGER for ids/ints, REAL for floats, TEXT for strings, BOOLEAN for bools (SQLite stores as 0/1), TEXT for JSON columns (batting_order, positions, details, stat_scores).",
      "estimatedHours": 2,
      "notes": "SQLite has no native JSON type — use TEXT and serialize/deserialize with serde_json in Rust. SQLite also has no native DATETIME — use TEXT in ISO 8601 format (chrono::NaiveDateTime serializes this way). The Python models use autoincrement for most PKs, but Team and Player use API-provided IDs as PK (not autoincrement)."
    },
    {
      "id": "CRIT-002",
      "name": "Implement database session/connection management",
      "description": "Port the Python get_session() context manager pattern to Rust. Need a way to acquire a connection from the pool, run queries, and handle commit/rollback. The Python version uses async context manager with auto-commit on success and auto-rollback on exception.",
      "category": "critical",
      "priority": 2,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-001"],
      "files": [
        {
          "path": "rust/src/db/schema.rs",
          "lines": [6, 18],
          "issue": "Only has init_pool — no session management, no create_tables"
        },
        {
          "path": "src/sba_scout/db/schema.py",
          "lines": [46, 63, 65, 76, 85, 97],
          "issue": "Python reference — get_session ctx manager, init/drop/reset/close_database"
        }
      ],
      "suggestedFix": "sqlx uses the pool directly (no ORM session). For transaction support, use `pool.begin()` which returns a `Transaction` that auto-rolls-back on drop. Add helper functions: 1. `create_tables(pool)` — runs the migration SQL. 2. `reset_database(pool)` — DROP TABLE IF EXISTS for all 9 tables, then create_tables. Most queries will just use `&SqlitePool` directly since sqlx auto-manages connections. For operations that need atomicity (sync upserts), use `pool.begin()` explicitly.",
      "estimatedHours": 1,
      "notes": "Unlike SQLAlchemy, sqlx doesn't need a session factory pattern. The pool IS the connection manager. Keep it simple — don't over-abstract."
    },
    {
      "id": "CRIT-003",
      "name": "Integrate config loading into main.rs and App",
      "description": "Wire up the existing config.rs (figment-based Settings) into the application startup flow. Load settings in main(), pass to App, pass db_path to pool init. Currently main.rs ignores config entirely.",
      "category": "critical",
      "priority": 3,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-001", "CRIT-002"],
      "files": [
        {
          "path": "rust/src/main.rs",
          "lines": [15, 18, 24],
          "issue": "No config loading, no DB pool init, App::new() takes no args"
        },
        {
          "path": "rust/src/app.rs",
          "lines": [19, 24],
          "issue": "App struct has no fields for settings or db pool"
        },
        {
          "path": "rust/src/config.rs",
          "lines": [106],
          "issue": "load_settings() exists but is never called"
        }
      ],
      "suggestedFix": "1. In main.rs: call `load_settings()`, then `init_pool(&settings.db_path)`, then `create_tables(&pool)`. 2. Add `pool: SqlitePool` and `settings: Settings` fields to App struct. 3. Update App::new(settings, pool) constructor. 4. Pass pool reference to screen render functions (they'll need it for queries in Phase 4). 5. Ensure data/ directory is created if missing (match Python's ensure_db_directory validator). 6. Add TOML settings file support — config.rs already has Toml provider but the Rust config uses settings.toml while Python uses settings.yaml. Decide on TOML (idiomatic Rust) and document the difference.",
      "estimatedHours": 1.5,
      "notes": "The Python app uses a global lazy singleton for settings. In Rust, prefer passing owned/borrowed Settings through the app rather than using a global. The pool is Clone-able (it's an Arc internally) so passing it around is cheap."
    },
    {
      "id": "HIGH-001",
      "name": "Implement team query functions",
      "description": "Port all team queries from Python db/queries.py to Rust db/queries.rs: get_all_teams, get_team_by_abbrev, get_team_by_id.",
      "category": "high",
      "priority": 4,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-002"],
      "files": [
        {
          "path": "rust/src/db/queries.rs",
          "lines": [1, 2],
          "issue": "Empty stub — only has a comment"
        },
        {
          "path": "src/sba_scout/db/queries.py",
          "lines": [31, 61, 72],
          "issue": "Python reference — 3 team query functions"
        }
      ],
      "suggestedFix": "Use `sqlx::query_as::<_, Team>()` with hand-written SQL. Key details: 1. `get_all_teams(pool, season, active_only)` — WHERE season = ? AND (if active_only) abbrev NOT LIKE '%IL' AND abbrev NOT LIKE '%MiL' ORDER BY abbrev. 2. `get_team_by_abbrev(pool, abbrev, season)` — WHERE abbrev = ? AND season = ? returns Option<Team>. 3. `get_team_by_id(pool, team_id)` — WHERE id = ? returns Option<Team>.",
      "estimatedHours": 1,
      "notes": "sqlx::query_as maps rows directly to structs via FromRow derive (already on models). Use fetch_all for lists, fetch_optional for Option<T>."
    },
    {
      "id": "HIGH-002",
      "name": "Implement player query functions",
      "description": "Port all player queries: get_players_by_team, get_player_by_id, get_player_by_name, search_players, get_pitchers, get_batters, get_players_missing_cards.",
      "category": "high",
      "priority": 5,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-002"],
      "files": [
        {
          "path": "rust/src/db/queries.rs",
          "lines": [1, 2],
          "issue": "Empty stub"
        },
        {
          "path": "src/sba_scout/db/queries.py",
          "lines": [84, 113, 131, 153, 189, 214, 259],
          "issue": "Python reference — 7 player query functions"
        }
      ],
      "suggestedFix": "Key differences from Python: 1. No ORM eager loading (selectinload) — sqlx uses raw SQL. For 'include_cards', use LEFT JOIN on batter_cards/pitcher_cards and map to a custom struct (PlayerWithCards) that has Option<BatterCard> and Option<PitcherCard>. Alternatively, do separate queries (simpler, matches Python's selectinload which runs separate SELECTs anyway). 2. `search_players` uses LIKE '%query%' (SQLite is case-insensitive for ASCII by default with LIKE). 3. `get_pitchers` checks pos_1 OR pos_2 IN ('SP','RP','CP'). 4. `get_batters` checks pos_1 IN ('C','1B','2B','3B','SS','LF','CF','RF','DH'). 5. `get_players_missing_cards` uses subquery anti-join: WHERE id NOT IN (SELECT player_id FROM batter_cards).",
      "estimatedHours": 2.5,
      "notes": "Decide on the 'include_cards' pattern early. Recommend: separate queries approach (fetch players, then batch-fetch cards by player_ids). This avoids complex JOIN mapping and matches how the Python ORM actually executes selectinload. Create a PlayerWithCards struct or add a method to attach cards after loading."
    },
    {
      "id": "HIGH-003",
      "name": "Implement card query functions",
      "description": "Port card queries: get_batter_card, get_pitcher_card.",
      "category": "high",
      "priority": 6,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-002"],
      "files": [
        {
          "path": "rust/src/db/queries.rs",
          "lines": [1, 2],
          "issue": "Empty stub"
        },
        {
          "path": "src/sba_scout/db/queries.py",
          "lines": [239, 249],
          "issue": "Python reference — 2 card query functions"
        }
      ],
      "suggestedFix": "Simple single-table queries: 1. `get_batter_card(pool, player_id) -> Option<BatterCard>` — SELECT * FROM batter_cards WHERE player_id = ?. 2. `get_pitcher_card(pool, player_id) -> Option<PitcherCard>` — SELECT * FROM pitcher_cards WHERE player_id = ?.",
      "estimatedHours": 0.5,
      "notes": "Straightforward — these are the simplest queries in the system."
    },
    {
      "id": "HIGH-004",
      "name": "Implement roster query function (get_my_roster)",
      "description": "Port the composite roster query that fetches majors, minors, and IL players for the user's team. This is a high-level function that calls team + player queries internally.",
      "category": "high",
      "priority": 7,
      "completed": true,
      "tested": true,
      "dependencies": ["HIGH-001", "HIGH-002"],
      "files": [
        {
          "path": "src/sba_scout/db/queries.py",
          "lines": [309, 336],
          "issue": "Python reference — get_my_roster function"
        }
      ],
      "suggestedFix": "Create a `Roster` struct with fields `majors: Vec<Player>`, `minors: Vec<Player>`, `il: Vec<Player>`. Implement `get_my_roster(pool, team_abbrev, season) -> Roster` that: 1. Looks up team by abbrev (e.g., 'WV'). 2. Looks up IL team by abbrev + 'IL' (e.g., 'WVIL'). 3. Looks up MiL team by abbrev + 'MiL' (e.g., 'WVMiL'). 4. Fetches players for each (with cards). Returns empty vecs if team not found.",
      "estimatedHours": 1,
      "notes": "Consider running the 3 player queries concurrently with tokio::join! since they're independent."
    },
    {
      "id": "HIGH-005",
      "name": "Implement sync status query functions",
      "description": "Port sync status queries: get_sync_status and update_sync_status (upsert pattern).",
      "category": "high",
      "priority": 8,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-002"],
      "files": [
        {
          "path": "src/sba_scout/db/queries.py",
          "lines": [344, 354],
          "issue": "Python reference — get_sync_status and update_sync_status"
        }
      ],
      "suggestedFix": "1. `get_sync_status(pool, entity_type) -> Option<SyncStatus>` — SELECT * FROM sync_status WHERE entity_type = ?. 2. `update_sync_status(pool, entity_type, count, error)` — Use SQLite's INSERT OR REPLACE (or INSERT ... ON CONFLICT(entity_type) DO UPDATE) for clean upsert. The Python version does a select-then-update/insert pattern which is racy; the SQL upsert is better.",
      "estimatedHours": 0.5,
      "notes": "SQLite ON CONFLICT is the idiomatic way to do upserts. This is simpler than the Python approach."
    },
    {
      "id": "MED-001",
      "name": "Implement matchup cache query functions",
      "description": "Port matchup cache queries: get_cached_matchup, invalidate_matchup_cache. Note: MatchupCache table exists but is largely unused in practice — the StandardizedScoreCache is the primary cache. Still needed for completeness.",
      "category": "medium",
      "priority": 9,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-002"],
      "files": [
        {
          "path": "src/sba_scout/db/queries.py",
          "lines": [382, 398],
          "issue": "Python reference — 2 matchup cache functions"
        }
      ],
      "suggestedFix": "1. `get_cached_matchup(pool, batter_id, pitcher_id, weights_hash) -> Option<MatchupCache>` — SELECT WHERE batter_id = ? AND pitcher_id = ? AND weights_hash = ?. 2. `invalidate_matchup_cache(pool) -> i64` — DELETE FROM matchup_cache, return rows_affected.",
      "estimatedHours": 0.5,
      "notes": "Low usage in practice but include for feature parity."
    },
    {
      "id": "MED-002",
      "name": "Implement lineup query functions",
      "description": "Port lineup CRUD: get_lineups, get_lineup_by_name, save_lineup (upsert), delete_lineup.",
      "category": "medium",
      "priority": 10,
      "completed": true,
      "tested": true,
      "dependencies": ["CRIT-002"],
      "files": [
        {
          "path": "src/sba_scout/db/queries.py",
          "lines": [418, 425, 435, 468],
          "issue": "Python reference — 4 lineup CRUD functions"
        }
      ],
      "suggestedFix": "1. `get_lineups(pool) -> Vec<Lineup>` — SELECT * ORDER BY name. 2. `get_lineup_by_name(pool, name) -> Option<Lineup>` — WHERE name = ?. 3. `save_lineup(pool, name, batting_order, positions, ...)` — INSERT OR REPLACE. batting_order and positions are JSON TEXT — serialize Vec<i64> and HashMap<String, i64> with serde_json::to_string. 4. `delete_lineup(pool, name) -> bool` — DELETE WHERE name = ?, return rows_affected > 0. Note: The Lineup model in Rust stores batting_order/positions as String (JSON text). Add helper methods or a wrapper to deserialize on read.",
      "estimatedHours": 1.5,
      "notes": "JSON serialization for batting_order (Vec<i64>) and positions (HashMap<String, i64>) needs serde_json. Consider adding `Lineup::batting_order_vec()` and `Lineup::positions_map()` convenience methods."
    },
    {
      "id": "LOW-001",
      "name": "Add missing crate dependencies for later phases",
      "description": "Add crates needed by Phase 2+ to Cargo.toml now so they're available: csv (CSV import), sha2 (cache hashing), regex (endurance parsing).",
      "category": "low",
      "priority": 11,
      "completed": true,
      "tested": true,
      "dependencies": [],
      "files": [
        {
          "path": "rust/Cargo.toml",
          "lines": [6, 38],
          "issue": "Missing csv, sha2, regex crates"
        }
      ],
      "suggestedFix": "Add to [dependencies]: `csv = \"1\"`, `sha2 = \"0.10\"`, `regex = \"1\"`. These are stable, widely-used crates with no breaking changes expected.",
      "estimatedHours": 0.25,
      "notes": "Quick win — add now to avoid compile delays later. No code changes needed."
    },
    {
      "id": "LOW-002",
      "name": "Add Lineup JSON helper methods",
      "description": "Add deserialization helpers to the Lineup model so screens can easily work with batting_order and positions as typed Rust values instead of raw JSON strings.",
      "category": "low",
      "priority": 12,
      "completed": true,
      "tested": true,
      "dependencies": ["MED-002"],
      "files": [
        {
          "path": "rust/src/db/models.rs",
          "lines": [209, 219],
          "issue": "Lineup stores batting_order/positions as Option<String> (JSON) with no parse helpers"
        }
      ],
      "suggestedFix": "Add impl block for Lineup with: 1. `batting_order_vec(&self) -> Vec<i64>` — deserialize JSON string or return empty vec. 2. `positions_map(&self) -> HashMap<String, i64>` — deserialize JSON string or return empty map. 3. `set_batting_order(&mut self, order: &[i64])` — serialize to JSON string. 4. `set_positions(&mut self, positions: &HashMap<String, i64>)` — serialize to JSON string.",
      "estimatedHours": 0.5,
      "notes": "Quality-of-life improvement that prevents JSON parse errors from spreading across the codebase."
    }
  ],
  "quickWins": [
    {
      "taskId": "LOW-001",
      "estimatedMinutes": 15,
      "impact": "Prevents compile-time delays when starting Phase 2"
    },
    {
      "taskId": "HIGH-003",
      "estimatedMinutes": 20,
      "impact": "Simplest queries — good warmup for the query pattern"
    }
  ],
  "productionBlockers": [
    {
      "taskId": "CRIT-001",
      "reason": "No tables = no data storage. Everything depends on this."
    },
    {
      "taskId": "CRIT-002",
      "reason": "No connection management = can't execute any queries."
    },
    {
      "taskId": "CRIT-003",
      "reason": "App can't find the DB or API without config wired in."
    }
  ],
  "weeklyRoadmap": {
    "session1": {
      "theme": "Schema + Connection + Config",
      "tasks": ["CRIT-001", "CRIT-002", "CRIT-003", "LOW-001"],
      "estimatedHours": 5,
      "notes": "Get the app booting with a real DB connection and config loaded. Verify with cargo run."
    },
    "session2": {
      "theme": "Core Queries (Teams + Players + Cards)",
      "tasks": ["HIGH-001", "HIGH-002", "HIGH-003", "HIGH-004"],
      "estimatedHours": 5,
      "notes": "All the read queries that screens will need. Test against the existing Python-created DB file."
    },
    "session3": {
      "theme": "Supporting Queries + Polish",
      "tasks": ["HIGH-005", "MED-001", "MED-002", "LOW-002"],
      "estimatedHours": 4,
      "notes": "Sync status, cache, lineup CRUD, and JSON helpers. Phase 1 complete."
    }
  },
  "architecturalDecisions": {
    "no_orm_session_pattern": "sqlx uses pool directly — no session factory needed. Use pool.begin() for transactions.",
    "include_cards_strategy": "Separate queries (fetch players, then batch-fetch cards) rather than JOINs. Matches Python's selectinload behavior and keeps models simple.",
    "json_columns": "Store as TEXT in SQLite, serialize/deserialize with serde_json. Add helper methods on Lineup for typed access.",
    "upsert_pattern": "Use SQLite ON CONFLICT DO UPDATE instead of Python's select-then-update. Cleaner and race-free.",
    "config_format": "TOML (not YAML) for Rust config. figment + toml crate already in Cargo.toml. Document the format change from Python version.",
    "datetime_storage": "Store as TEXT in ISO 8601 format. chrono::NaiveDateTime with sqlx handles this automatically.",
    "pool_passing": "Pass SqlitePool by reference (&SqlitePool) to query functions. Pool is Clone (Arc internally) so App can own it and hand out refs."
  }
}