major-domo-database/.claude/sqlite-to-postgres/player-to-sbaplayer-matching/SYSTEM_PROMPT_FOR_CONTINUATION.md
Cal Corum 7130a1fd43 Postgres Migration
Migration documentation and scripts
2025-08-25 07:18:31 -05:00

5.5 KiB

System Prompt: Player-to-SbaPlayer Matching Project Continuation

Project Context

You are working on a critical database linking project for the Major Domo SBA (Strat-o-Matic Baseball Association) fantasy league system. The goal is to link ~12,000 player-season records to SbaPlayer records to enable career stat tracking across seasons 1-12.

Problem Background

The players table contains individual season records (e.g., "Aaron Judge season 8", "Aaron Judge season 9"). The sbaplayers table contains unique player identities with MLB keys. Currently, the players.sbaplayer_id field is null for all records, breaking career stat tracking.

Previous Failed Approach: An earlier attempt created 1,323 "new" SbaPlayer records when most players already existed (e.g., tried to create new "Taijuan Walker" when SbaPlayer ID 2105 already existed with key_bbref "walketa01").

Current Successful Approach

Multi-Tier Matching Strategy

  1. Tier 1 (bbref_id matching): Match player.bbref_id to sbaplayer.key_bbref - highest confidence
  2. Tier 2 (exact name matching): Normalized exact name matching for players without bbref_id
  3. Tier 3 (manual review): Human review of remaining unmatched players

Current Status - EXCELLENT PROGRESS

  • 11,632 players matched automatically (95.1% of all 12,232 players)
    • Tier 1: 6,764 players via bbref_id matching
    • Tier 2: 4,868 players via exact name matching
  • 📋 Only 32 unique players need manual review (down from 1,323!)

Files Structure

/mnt/NV2/Development/major-domo/database/player-to-sbaplayer-matching/
├── comprehensive_player_matching.py     # Main analysis script
├── generate_new_sbaplayers_list.py      # Generate review list
├── matching_report.txt                  # Detailed results
├── new_sbaplayers_for_review.csv        # Manual review needed
├── new_sbaplayers_summary.txt           # Summary
├── matching.log                         # Detailed logs
└── SYSTEM_PROMPT_FOR_CONTINUATION.md    # This file

Cached Data Location

All API data is cached in /tmp/ to avoid repeated API calls:

  • /tmp/sbaplayers.json - All 2,234 SbaPlayer records
  • /tmp/players_season_{1-12}.json - All player records by season

Current Task: Manual Review Phase

What Needs Review

The file new_sbaplayers_for_review.csv contains 32 unique players needing manual decisions. Examples:

Obvious Existing Matches (script found these):

  • Diego Castillo → SbaPlayer ID 341
  • Javy Guerra → SbaPlayer ID 784
  • Will Smith → SbaPlayer ID 1841
  • Logan Allen → SbaPlayer ID 45

Requires Human Decision:

  • Michael A Taylor → Maybe SbaPlayer ID 1963 "Michael Taylor" (missing middle initial?)
  • Josh Fuentes → Maybe SbaPlayer ID 664 "Joshua Fuentes" (Josh vs Joshua?)
  • Luis Garcia variants (multiple similar players)
  • bbref_id "HALP" (corrupted data)

Review Process

User should edit new_sbaplayers_for_review.csv:

  • Column your_decision_sbaplayer_id: Enter existing SbaPlayer ID or leave blank for new record
  • Column your_decision_notes: Add any comments

Next Steps After Review

  1. Process user decisions from the reviewed CSV
  2. Generate final assignment files:
    • new_sbaplayers_to_insert.csv - New SbaPlayer records to create
    • player_sbaplayer_assignments.csv - All 12,232 player assignments
  3. Execute database updates via API calls
  4. Verify career stat tracking works correctly

Key Technical Details

Database Schema

  • players table: season-specific records with sbaplayer_id (currently null)
  • sbaplayers table: unique player identities with MLB keys (key_bbref, key_fangraphs, etc.)

API Endpoints

  • Production API: https://api.sba.manticorum.com/
  • Key endpoints: /players?season=X, /sbaplayers, /players/{id} (PATCH)

Matching Logic

# Tier 1: bbref_id matching
if player.bbref_id and player.bbref_id in sbaplayer_bbref_map:
    match = sbaplayer_bbref_map[player.bbref_id]

# Tier 2: normalized name matching  
normalized_name = normalize_name(player.name)
if normalized_name in sbaplayer_name_map and len(matches) == 1:
    match = sbaplayer_name_map[normalized_name][0]

Name Normalization Rules

  • Lowercase, remove apostrophes/periods
  • Handle nicknames (Mike/Michael, Will/William, etc.)
  • Remove Jr/Sr suffixes
  • Replace hyphens with spaces

Success Metrics

  • 95.1% automatic matching achieved
  • Only 32 manual decisions needed
  • Avoided creating 1,300+ duplicate records
  • Career stat tracking will work for 11,632 players immediately

Potential Issues to Watch

  1. Multiple Luis Garcia players - need to distinguish carefully
  2. Middle initial variations (Michael A Taylor vs Michael Taylor)
  3. Nickname variations (Josh vs Joshua)
  4. Corrupted bbref_ids (like "HALP")

Database Safety

  • All changes should be API-based (no direct database access)
  • Review all assignments before execution
  • Backup approach: Can revert by setting sbaplayer_id back to null
  • Verification: Test career stats for sample players after completion

How to Continue

  1. Wait for user to complete manual review of new_sbaplayers_for_review.csv
  2. Process the reviewed decisions into final assignment files
  3. Show user the final assignments for verification before execution
  4. Execute the database updates via API calls
  5. Verify career stat functionality works correctly

The groundwork is solid - we just need manual review of 32 cases to complete this major database improvement project.