5.5 KiB
System Prompt: Player-to-SbaPlayer Matching Project Continuation
Project Context
You are working on a critical database linking project for the Major Domo SBA (Strat-o-Matic Baseball Association) fantasy league system. The goal is to link ~12,000 player-season records to SbaPlayer records to enable career stat tracking across seasons 1-12.
Problem Background
The players table contains individual season records (e.g., "Aaron Judge season 8", "Aaron Judge season 9"). The sbaplayers table contains unique player identities with MLB keys. Currently, the players.sbaplayer_id field is null for all records, breaking career stat tracking.
Previous Failed Approach: An earlier attempt created 1,323 "new" SbaPlayer records when most players already existed (e.g., tried to create new "Taijuan Walker" when SbaPlayer ID 2105 already existed with key_bbref "walketa01").
Current Successful Approach
Multi-Tier Matching Strategy
- Tier 1 (bbref_id matching): Match
player.bbref_idtosbaplayer.key_bbref- highest confidence - Tier 2 (exact name matching): Normalized exact name matching for players without bbref_id
- Tier 3 (manual review): Human review of remaining unmatched players
Current Status - EXCELLENT PROGRESS
- ✅ 11,632 players matched automatically (95.1% of all 12,232 players)
- Tier 1: 6,764 players via bbref_id matching
- Tier 2: 4,868 players via exact name matching
- 📋 Only 32 unique players need manual review (down from 1,323!)
Files Structure
/mnt/NV2/Development/major-domo/database/player-to-sbaplayer-matching/
├── comprehensive_player_matching.py # Main analysis script
├── generate_new_sbaplayers_list.py # Generate review list
├── matching_report.txt # Detailed results
├── new_sbaplayers_for_review.csv # Manual review needed
├── new_sbaplayers_summary.txt # Summary
├── matching.log # Detailed logs
└── SYSTEM_PROMPT_FOR_CONTINUATION.md # This file
Cached Data Location
All API data is cached in /tmp/ to avoid repeated API calls:
/tmp/sbaplayers.json- All 2,234 SbaPlayer records/tmp/players_season_{1-12}.json- All player records by season
Current Task: Manual Review Phase
What Needs Review
The file new_sbaplayers_for_review.csv contains 32 unique players needing manual decisions. Examples:
Obvious Existing Matches (script found these):
- Diego Castillo → SbaPlayer ID 341
- Javy Guerra → SbaPlayer ID 784
- Will Smith → SbaPlayer ID 1841
- Logan Allen → SbaPlayer ID 45
Requires Human Decision:
- Michael A Taylor → Maybe SbaPlayer ID 1963 "Michael Taylor" (missing middle initial?)
- Josh Fuentes → Maybe SbaPlayer ID 664 "Joshua Fuentes" (Josh vs Joshua?)
- Luis Garcia variants (multiple similar players)
- bbref_id "HALP" (corrupted data)
Review Process
User should edit new_sbaplayers_for_review.csv:
- Column
your_decision_sbaplayer_id: Enter existing SbaPlayer ID or leave blank for new record - Column
your_decision_notes: Add any comments
Next Steps After Review
- Process user decisions from the reviewed CSV
- Generate final assignment files:
new_sbaplayers_to_insert.csv- New SbaPlayer records to createplayer_sbaplayer_assignments.csv- All 12,232 player assignments
- Execute database updates via API calls
- Verify career stat tracking works correctly
Key Technical Details
Database Schema
playerstable: season-specific records withsbaplayer_id(currently null)sbaplayerstable: unique player identities with MLB keys (key_bbref,key_fangraphs, etc.)
API Endpoints
- Production API:
https://api.sba.manticorum.com/ - Key endpoints:
/players?season=X,/sbaplayers,/players/{id}(PATCH)
Matching Logic
# Tier 1: bbref_id matching
if player.bbref_id and player.bbref_id in sbaplayer_bbref_map:
match = sbaplayer_bbref_map[player.bbref_id]
# Tier 2: normalized name matching
normalized_name = normalize_name(player.name)
if normalized_name in sbaplayer_name_map and len(matches) == 1:
match = sbaplayer_name_map[normalized_name][0]
Name Normalization Rules
- Lowercase, remove apostrophes/periods
- Handle nicknames (Mike/Michael, Will/William, etc.)
- Remove Jr/Sr suffixes
- Replace hyphens with spaces
Success Metrics
- 95.1% automatic matching achieved ✅
- Only 32 manual decisions needed ✅
- Avoided creating 1,300+ duplicate records ✅
- Career stat tracking will work for 11,632 players immediately ✅
Potential Issues to Watch
- Multiple Luis Garcia players - need to distinguish carefully
- Middle initial variations (Michael A Taylor vs Michael Taylor)
- Nickname variations (Josh vs Joshua)
- Corrupted bbref_ids (like "HALP")
Database Safety
- All changes should be API-based (no direct database access)
- Review all assignments before execution
- Backup approach: Can revert by setting
sbaplayer_idback to null - Verification: Test career stats for sample players after completion
How to Continue
- Wait for user to complete manual review of
new_sbaplayers_for_review.csv - Process the reviewed decisions into final assignment files
- Show user the final assignments for verification before execution
- Execute the database updates via API calls
- Verify career stat functionality works correctly
The groundwork is solid - we just need manual review of 32 cases to complete this major database improvement project.