major-domo-database/.claude/sqlite-to-postgres/DATA_SANITIZATION_TEMPLATE.md
Cal Corum 7130a1fd43 Postgres Migration
Migration documentation and scripts
2025-08-25 07:18:31 -05:00

6.0 KiB

Data Sanitization Template for PostgreSQL Migration

Template Structure

Each data sanitization issue should follow this standardized format for consistent tracking and resolution.


Issue Template

Issue ID: [CATEGORY]-[TABLE]-[FIELD]-[NUMBER]

Example: CONSTRAINT-CURRENT-INJURY_COUNT-001

📊 Issue Classification

  • Category: [SCHEMA|DATA_INTEGRITY|DATA_QUALITY|MIGRATION_LOGIC]
  • Priority: [CRITICAL|HIGH|MEDIUM|LOW]
  • Impact: [BLOCKS_MIGRATION|DATA_LOSS|PERFORMANCE|COSMETIC]
  • Table(s): [table_name, related_tables]
  • Field(s): [field_names]

🔍 Problem Description

What happened: Clear description of the error or issue encountered.

Error Message:

Exact error message from logs

Expected Behavior: What should happen in a successful migration.

Current Behavior: What actually happens.

📈 Impact Assessment

Data Affected:

  • Records: X out of Y total
  • Percentage: Z%
  • Critical data: YES/NO

Business Impact:

  • User-facing features affected
  • Operational impact
  • Compliance/audit concerns

🔧 Root Cause Analysis

Technical Cause:

  • SQLite vs PostgreSQL difference
  • Data model assumption
  • Migration logic flaw

Data Source:

  • How did this data get into this state?
  • Is this expected or corrupted data?
  • Historical context

💡 Solution Strategy

Approach: [TRANSFORM_DATA|FIX_SCHEMA|MIGRATION_LOGIC|SKIP_TABLE]

Technical Solution: Detailed explanation of how to fix the issue.

Data Transformation Required:

-- Example transformation query
UPDATE table_name 
SET field_name = COALESCE(field_name, default_value)
WHERE field_name IS NULL;

Implementation Plan

Steps:

  1. Backup current state
  2. Implement fix
  3. Test on sample data
  4. Run full migration test
  5. Validate results
  6. Document changes

Rollback Plan: How to undo changes if something goes wrong.

🧪 Testing Strategy

Test Cases:

  1. Happy path: Normal data migrates correctly
  2. Edge case: Problem data is handled properly
  3. Regression: Previous fixes still work

Validation Queries:

-- Query to verify fix worked
SELECT COUNT(*) FROM table_name WHERE condition;

📋 Resolution Status

  • Status: [IDENTIFIED|IN_PROGRESS|TESTING|RESOLVED|DEFERRED]
  • Assigned To: [team_member]
  • Date Identified: YYYY-MM-DD
  • Date Resolved: YYYY-MM-DD
  • Solution Applied: [description]

📚 Example Issues (From Our Testing)

Issue ID: CONSTRAINT-CURRENT-INJURY_COUNT-001

Category: SCHEMA
Priority: HIGH
Impact: BLOCKS_MIGRATION

Problem Description: injury_count field in current table has NULL values in SQLite but PostgreSQL schema requires NOT NULL.

Error Message:

null value in column "injury_count" of relation "current" violates not-null constraint

Solution Strategy: TRANSFORM_DATA

-- Transform NULL values to 0 before migration
UPDATE current SET injury_count = 0 WHERE injury_count IS NULL;

Implementation:

  1. Add data transformation in migration script
  2. Set default value for future records
  3. Update schema if business logic allows NULL

Issue ID: DATA_QUALITY-PLAYER-NAME-001

Category: DATA_QUALITY
Priority: MEDIUM
Impact: DATA_LOSS

Problem Description: Player names exceed PostgreSQL VARCHAR(255) limit causing truncation.

Error Message:

value too long for type character varying(255)

Solution Strategy: FIX_SCHEMA

-- Increase column size in PostgreSQL
ALTER TABLE player ALTER COLUMN name TYPE VARCHAR(500);

Implementation:

  1. Analyze max string lengths in SQLite
  2. Update PostgreSQL schema with appropriate limits
  3. Add validation to prevent future overruns

Issue ID: MIGRATION_LOGIC-TEAM-INTEGER-001

Category: MIGRATION_LOGIC
Priority: HIGH
Impact: BLOCKS_MIGRATION

Problem Description: Large integer values in SQLite exceed PostgreSQL INTEGER range.

Error Message:

integer out of range

Solution Strategy: FIX_SCHEMA

-- Use BIGINT instead of INTEGER
ALTER TABLE team ALTER COLUMN large_field TYPE BIGINT;

Implementation:

  1. Identify fields with large values
  2. Update schema to use BIGINT
  3. Verify no application code assumes INTEGER size

🎯 Standard Solution Patterns

Pattern 1: NULL Constraint Violations

# Pre-migration data cleaning
def clean_null_constraints(table_name, field_name, default_value):
    query = f"UPDATE {table_name} SET {field_name} = ? WHERE {field_name} IS NULL"
    sqlite_db.execute_sql(query, (default_value,))

Pattern 2: String Length Overruns

# Schema adjustment
def adjust_varchar_limits(table_name, field_name, new_limit):
    query = f"ALTER TABLE {table_name} ALTER COLUMN {field_name} TYPE VARCHAR({new_limit})"
    postgres_db.execute_sql(query)

Pattern 3: Integer Range Issues

# Type upgrade
def upgrade_integer_fields(table_name, field_name):
    query = f"ALTER TABLE {table_name} ALTER COLUMN {field_name} TYPE BIGINT"
    postgres_db.execute_sql(query)

Pattern 4: Missing Table Handling

# Graceful table skipping
def safe_table_migration(model_class):
    try:
        migrate_table_data(model_class)
    except Exception as e:
        if "no such table" in str(e):
            logger.warning(f"Table {model_class._meta.table_name} doesn't exist in source")
            return True
        raise

📊 Issue Tracking Spreadsheet Template

Issue ID Category Priority Table Field Status Date Found Date Fixed Notes
CONSTRAINT-CURRENT-INJURY_COUNT-001 SCHEMA HIGH current injury_count RESOLVED 2025-01-15 2025-01-15 Set NULL to 0
DATA_QUALITY-PLAYER-NAME-001 DATA_QUALITY MEDIUM player name IN_PROGRESS 2025-01-15 Increase VARCHAR limit

This template ensures consistent documentation and systematic resolution of migration issues.