paper-dynasty-card-creation/docs/LESSONS_LEARNED_ASTERISK_REGRESSION.md
Cal Corum cc5f93eb66 Fix critical asterisk regression in player names
CRITICAL BUG FIX: Removed code that was appending asterisks to left-handed
players' names and hash symbols to switch hitters' names in production.

## Changes

### Core Fix (retrosheet_data.py)
- Removed name_suffix code from new_player_payload() (lines 1103-1108)
- Players names now stored cleanly without visual indicators
- Affected 20 left-handed batters in 2005 Live cardset

### New Utility Scripts
- fix_player_names.py: PATCH player names to remove symbols (uses 'name' param)
- check_player_names.py: Verify all players for asterisks/hashes
- regenerate_lefty_cards.py: Update image URLs with cache-busting dates
- upload_lefty_cards_to_s3.py: Fetch fresh cards and upload to S3

### Documentation (CRITICAL - READ BEFORE WORKING WITH CARDS)
- docs/LESSONS_LEARNED_ASTERISK_REGRESSION.md: Comprehensive guide
  * API parameter is 'name' NOT 'p_name'
  * Card generation caching requires timestamp cache-busting
  * S3 keys must not include query parameters
  * Player names only in 'players' table
  * Never append visual indicators to stored data

- CLAUDE.md: Added critical warnings section at top

## Key Learnings
1. API param for player name is 'name', not 'p_name'
2. Cards are cached - use timestamp in ?d= parameter
3. S3 keys != S3 URLs (no query params in keys)
4. Fix data BEFORE generating/uploading cards
5. Visual indicators belong in UI, not database

## Impact
- Fixed 20 player records in production
- Regenerated and uploaded 20 clean cards to S3
- Documented to prevent future regressions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-24 14:38:04 -06:00

178 lines
5.0 KiB
Markdown

# Lessons Learned: Asterisk Regression & Card Upload Issues
**Date**: 2025-11-24
**Issue**: Left-handed players had asterisks appended to their names in production
---
## Critical Learnings
### 1. API Parameter Names vs Database Field Names
**WRONG**: Using database field name for API calls
```python
await db_patch('players', object_id=player_id, params=[('p_name', clean_name)]) # ❌
```
**CORRECT**: Use API parameter name
```python
await db_patch('players', object_id=player_id, params=[('name', clean_name)]) # ✅
```
**Key Point**: The API parameter is `name`, NOT `p_name`. The database field may be `p_name`, but the API expects `name`.
**Example PATCH URL**: `/api/v2/players/:player_id?name=Luis Garcia Jr`
---
### 2. Card Generation Caching
**Problem**: Cards are cached by the API. Using the same `?d=` parameter returns cached cards even after database changes.
**Solution**: Always use a timestamp for cache-busting when regenerating cards:
```python
import time
timestamp = int(time.time())
release_date = f'2025-11-25-{timestamp}'
card_url = f'{API_URL}/players/{id}/battingcard?d={release_date}'
```
**Key Point**: Static dates (like `2025-11-24`) will return cached cards. Use timestamps to force fresh generation.
---
### 3. S3 Keys Must Not Include Query Parameters
**WRONG**: Including query parameter in S3 key
```python
s3_key = f'cards/cardset-027/player-{id}/battingcard.png?d={date}' # ❌
# This creates a file literally named "battingcard.png?d=2025-11-24"
```
**CORRECT**: Separate key from query parameter
```python
s3_key = f'cards/cardset-027/player-{id}/battingcard.png' # ✅
s3_url = f'{S3_BASE_URL}/{s3_key}?d={date}' # Query param in URL, not key
```
**Key Point**: S3 object keys should be clean paths. Query parameters are for URLs only.
---
### 4. Name Suffix Code Should Never Be in Production
**The Bug**: Code was appending asterisks to left-handed players
```python
# This was in new_player_payload() - retrosheet_data.py lines 1103-1108
name_suffix = ''
if row.get('bat_hand') == 'L':
name_suffix = '*'
elif row.get('bat_hand') == 'S':
name_suffix = '#'
'p_name': f'{row["use_name"]} {row["last_name"]}{name_suffix}' # ❌
```
**Why It Existed**: Likely added for visual identification during development/testing.
**Why It's Wrong**:
- Stores corrupted data in production database
- Card images display asterisks
- Breaks searching/filtering by name
**Prevention**:
- Never append visual indicators to stored data
- Use separate display fields if needed
- Always review diffs before committing
---
### 5. Workflow Order Matters
**WRONG ORDER**:
1. Generate cards (with asterisks)
2. Upload to S3 (with asterisks)
3. Fix names in database
4. Try to re-upload (but get cached cards)
**CORRECT ORDER**:
1. Fix data issues in database FIRST
2. Verify fixes with GET requests
3. Use cache-busting parameters
4. Fetch fresh cards
5. Upload to S3
6. Verify uploaded images
**Key Point**: Always verify database changes before triggering card generation/upload.
---
### 6. Card Name Source
**Fact**: Player names are ONLY stored in the `players` table.
**When hitting** `/api/v2/players/{id}/battingcard?d={date}`:
- API pulls name from `players.p_name` field in real-time
- `battingcards` and `pitchingcards` tables DO NOT store names
- Card generation is live, not pre-rendered
**Key Point**: To fix card names, only update the `players` table. No need to update card tables.
---
## Prevention Checklist
Before any card regeneration/upload:
- [ ] Verify player names in database (no asterisks, hashes, or special chars)
- [ ] Use timestamp-based cache-busting for fresh card generation
- [ ] Confirm S3 keys don't include query parameters
- [ ] Test with ONE card before batch processing
- [ ] Verify uploaded S3 image is correct (spot check)
---
## Quick Reference
### API Parameter Names
- **Player name**: `name` (not `p_name`)
- **Player image**: `image`
- **Player positions**: `pos_1`, `pos_2`, etc.
### Cache-Busting Pattern
```python
import time
timestamp = int(time.time())
url = f'{API_URL}/players/{id}/battingcard?d=2025-11-25-{timestamp}'
```
### S3 Upload Pattern
```python
s3_key = f'cards/cardset-{cardset:03d}/player-{id}/battingcard.png'
s3_client.put_object(Bucket=bucket, Key=s3_key, Body=image_bytes)
s3_url = f'{S3_BASE_URL}/{s3_key}?d={cache_bust_param}'
```
---
## Files to Review for Similar Issues
1. **retrosheet_data.py**: Check for name suffix code
2. **live_series_update.py**: Check for name suffix code
3. **check_cards_and_upload.py**: Verify S3 key handling
4. **Any script that does db_patch('players', ...)**: Verify parameter names
---
## Impact Summary
**Issue Duration**: One card generation cycle
**Players Affected**: 20 left-handed batters in 2005 Live cardset
**Data Corrupted**: Player names had asterisks
**Cards Affected**: 20 cards on S3 with asterisks
**Resolution Time**: ~1 hour (including troubleshooting)
**Root Cause**: Development code (name suffix) left in production
**Fix Complexity**: Simple code removal + database patches
**Prevention**: Code review + testing before deployment