major-domo-database/DATA_SANITIZATION_TEMPLATE.md
Cal Corum 79a559088a CLAUDE: Phase 1 PostgreSQL migration fixes complete
- Fixed 4 critical schema issues blocking migration
- Resolved integer overflow by converting Discord IDs to strings
- Fixed VARCHAR length limits for Google Photos URLs
- Made injury_count field nullable for NULL values
- Successfully migrating 7/30 tables (5,432+ records)

Issues resolved:
- CONSTRAINT-CURRENT-INJURY_COUNT-001: Made nullable
- DATA_QUALITY-PLAYER-NAME-001: Increased VARCHAR limits to 1000
- MIGRATION_LOGIC-TEAM-INTEGER-001: Discord IDs now strings
- MIGRATION_LOGIC-DRAFTDATA-INTEGER-001: Channel IDs now strings

New issues discovered for Phase 2:
- CONSTRAINT-CURRENT-BSTATCOUNT-001: NULL stats count
- CONSTRAINT-TEAM-AUTO_DRAFT-001: NULL auto draft flag

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-18 18:09:45 -05:00

232 lines
6.0 KiB
Markdown

# Data Sanitization Template for PostgreSQL Migration
## Template Structure
Each data sanitization issue should follow this standardized format for consistent tracking and resolution.
---
## Issue Template
### Issue ID: [CATEGORY]-[TABLE]-[FIELD]-[NUMBER]
**Example**: `CONSTRAINT-CURRENT-INJURY_COUNT-001`
### 📊 Issue Classification
- **Category**: [SCHEMA|DATA_INTEGRITY|DATA_QUALITY|MIGRATION_LOGIC]
- **Priority**: [CRITICAL|HIGH|MEDIUM|LOW]
- **Impact**: [BLOCKS_MIGRATION|DATA_LOSS|PERFORMANCE|COSMETIC]
- **Table(s)**: [table_name, related_tables]
- **Field(s)**: [field_names]
### 🔍 Problem Description
**What happened:**
Clear description of the error or issue encountered.
**Error Message:**
```
Exact error message from logs
```
**Expected Behavior:**
What should happen in a successful migration.
**Current Behavior:**
What actually happens.
### 📈 Impact Assessment
**Data Affected:**
- Records: X out of Y total
- Percentage: Z%
- Critical data: YES/NO
**Business Impact:**
- User-facing features affected
- Operational impact
- Compliance/audit concerns
### 🔧 Root Cause Analysis
**Technical Cause:**
- SQLite vs PostgreSQL difference
- Data model assumption
- Migration logic flaw
**Data Source:**
- How did this data get into this state?
- Is this expected or corrupted data?
- Historical context
### 💡 Solution Strategy
**Approach:** [TRANSFORM_DATA|FIX_SCHEMA|MIGRATION_LOGIC|SKIP_TABLE]
**Technical Solution:**
Detailed explanation of how to fix the issue.
**Data Transformation Required:**
```sql
-- Example transformation query
UPDATE table_name
SET field_name = COALESCE(field_name, default_value)
WHERE field_name IS NULL;
```
### ✅ Implementation Plan
**Steps:**
1. [ ] Backup current state
2. [ ] Implement fix
3. [ ] Test on sample data
4. [ ] Run full migration test
5. [ ] Validate results
6. [ ] Document changes
**Rollback Plan:**
How to undo changes if something goes wrong.
### 🧪 Testing Strategy
**Test Cases:**
1. Happy path: Normal data migrates correctly
2. Edge case: Problem data is handled properly
3. Regression: Previous fixes still work
**Validation Queries:**
```sql
-- Query to verify fix worked
SELECT COUNT(*) FROM table_name WHERE condition;
```
### 📋 Resolution Status
- **Status**: [IDENTIFIED|IN_PROGRESS|TESTING|RESOLVED|DEFERRED]
- **Assigned To**: [team_member]
- **Date Identified**: YYYY-MM-DD
- **Date Resolved**: YYYY-MM-DD
- **Solution Applied**: [description]
---
## 📚 Example Issues (From Our Testing)
### Issue ID: CONSTRAINT-CURRENT-INJURY_COUNT-001
**Category**: SCHEMA
**Priority**: HIGH
**Impact**: BLOCKS_MIGRATION
**Problem Description:**
`injury_count` field in `current` table has NULL values in SQLite but PostgreSQL schema requires NOT NULL.
**Error Message:**
```
null value in column "injury_count" of relation "current" violates not-null constraint
```
**Solution Strategy:** TRANSFORM_DATA
```sql
-- Transform NULL values to 0 before migration
UPDATE current SET injury_count = 0 WHERE injury_count IS NULL;
```
**Implementation:**
1. Add data transformation in migration script
2. Set default value for future records
3. Update schema if business logic allows NULL
---
### Issue ID: DATA_QUALITY-PLAYER-NAME-001
**Category**: DATA_QUALITY
**Priority**: MEDIUM
**Impact**: DATA_LOSS
**Problem Description:**
Player names exceed PostgreSQL VARCHAR(255) limit causing truncation.
**Error Message:**
```
value too long for type character varying(255)
```
**Solution Strategy:** FIX_SCHEMA
```sql
-- Increase column size in PostgreSQL
ALTER TABLE player ALTER COLUMN name TYPE VARCHAR(500);
```
**Implementation:**
1. Analyze max string lengths in SQLite
2. Update PostgreSQL schema with appropriate limits
3. Add validation to prevent future overruns
---
### Issue ID: MIGRATION_LOGIC-TEAM-INTEGER-001
**Category**: MIGRATION_LOGIC
**Priority**: HIGH
**Impact**: BLOCKS_MIGRATION
**Problem Description:**
Large integer values in SQLite exceed PostgreSQL INTEGER range.
**Error Message:**
```
integer out of range
```
**Solution Strategy:** FIX_SCHEMA
```sql
-- Use BIGINT instead of INTEGER
ALTER TABLE team ALTER COLUMN large_field TYPE BIGINT;
```
**Implementation:**
1. Identify fields with large values
2. Update schema to use BIGINT
3. Verify no application code assumes INTEGER size
---
## 🎯 Standard Solution Patterns
### Pattern 1: NULL Constraint Violations
```python
# Pre-migration data cleaning
def clean_null_constraints(table_name, field_name, default_value):
query = f"UPDATE {table_name} SET {field_name} = ? WHERE {field_name} IS NULL"
sqlite_db.execute_sql(query, (default_value,))
```
### Pattern 2: String Length Overruns
```python
# Schema adjustment
def adjust_varchar_limits(table_name, field_name, new_limit):
query = f"ALTER TABLE {table_name} ALTER COLUMN {field_name} TYPE VARCHAR({new_limit})"
postgres_db.execute_sql(query)
```
### Pattern 3: Integer Range Issues
```python
# Type upgrade
def upgrade_integer_fields(table_name, field_name):
query = f"ALTER TABLE {table_name} ALTER COLUMN {field_name} TYPE BIGINT"
postgres_db.execute_sql(query)
```
### Pattern 4: Missing Table Handling
```python
# Graceful table skipping
def safe_table_migration(model_class):
try:
migrate_table_data(model_class)
except Exception as e:
if "no such table" in str(e):
logger.warning(f"Table {model_class._meta.table_name} doesn't exist in source")
return True
raise
```
## 📊 Issue Tracking Spreadsheet Template
| Issue ID | Category | Priority | Table | Field | Status | Date Found | Date Fixed | Notes |
|----------|----------|----------|-------|-------|--------|------------|------------|-------|
| CONSTRAINT-CURRENT-INJURY_COUNT-001 | SCHEMA | HIGH | current | injury_count | RESOLVED | 2025-01-15 | 2025-01-15 | Set NULL to 0 |
| DATA_QUALITY-PLAYER-NAME-001 | DATA_QUALITY | MEDIUM | player | name | IN_PROGRESS | 2025-01-15 | | Increase VARCHAR limit |
---
*This template ensures consistent documentation and systematic resolution of migration issues.*