claude-configs/skills/proxmox/docs/migration_checklist.md
Cal Corum 8a1d15911f Initial commit: Claude Code configuration backup
Version control Claude Code configuration including:
- Global instructions (CLAUDE.md)
- User settings (settings.json)
- Custom agents (architect, designer, engineer, etc.)
- Custom skills (create-skill templates and workflows)

Excludes session data, secrets, cache, and temporary files per .gitignore.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 16:34:21 -06:00

12 KiB

VM to LXC Migration Testing Checklist

Comprehensive validation checklist for VM to LXC container migrations.

Pre-Migration Checklist

Planning Phase

  • VM analyzed with migration tool: python3 migrate_vm_to_lxc.py analyze --vmid <id>
  • Migration suitability confirmed (excellent or good)
  • Migration plan generated and reviewed
  • Target LXC container ID selected (not in use)
  • Static IP address planned (if needed)
  • Maintenance window scheduled (low-traffic period)
  • Stakeholders notified (if production service)
  • Rollback plan documented and understood

Backup Phase

  • VM snapshot created: snapshot-name: pre-migration-YYYY-MM-DD
  • VM snapshot verified in Proxmox UI
  • Docker Compose files backed up from VM
  • Docker volumes/data backed up (if applicable)
  • List of running containers documented
  • Environment variables documented
  • Network configuration documented (IP, ports, DNS)
  • External dependencies documented (databases, APIs, etc.)

Infrastructure Validation

  • Docker LXC template exists (ID 9001 or custom)
  • Target container ID available
  • Sufficient storage space on target storage pool
  • Network configuration confirmed (VLAN, bridge, gateway)
  • DNS entries documented (update after migration if needed)
  • Firewall rules documented
  • Reverse proxy configuration backed up (if using NPM/Traefik)

Migration Execution Checklist

Phase 1: Pre-Migration Testing

  • VM is running and healthy
  • All services responding normally
  • No error logs in VM
  • Docker containers all running: docker ps -a
  • Resource usage documented (CPU, RAM, disk)
  • Performance baseline captured (response times, etc.)

Phase 2: VM Shutdown

  • Services gracefully stopped (if order matters)
  • Docker containers stopped: docker compose down (optional)
  • VM shut down gracefully: shutdown -h now or Proxmox
  • VM status confirmed: stopped
  • Snapshot remains intact

Phase 3: LXC Creation

  • LXC created from template
  • Container ID matches plan
  • Hostname configured correctly
  • Memory allocation set (estimated from analysis)
  • CPU cores allocated (match or reduce from VM)
  • Storage configured correctly
  • Network configured (static IP or DHCP)
  • Docker features enabled: nesting=1,keyctl=1
  • Container set to privileged mode (unprivileged=0)
  • Container configuration reviewed in Proxmox UI

Phase 4: LXC Initial Start

  • Container started successfully
  • Container status: running
  • Container accessible via SSH
  • Network connectivity confirmed: ping 8.8.8.8
  • DNS resolution working: nslookup google.com
  • Docker service running: systemctl status docker
  • Docker working: docker ps (should be empty initially)

Service Migration Checklist

Phase 5: Docker Configuration Transfer

  • Docker Compose files copied to LXC
  • Directory structure matches VM layout
  • File permissions verified
  • Environment files copied (.env files)
  • Docker volumes path confirmed
  • Data directories created (if needed)
  • Configuration files reviewed for absolute paths

Phase 6: Docker Containers Deployment

  • Docker Compose files validated: docker compose config
  • Images pulled successfully: docker compose pull
  • Containers created: docker compose up -d
  • All containers started: docker compose ps
  • No container restart loops: docker ps (check STATUS)
  • Container logs checked: docker compose logs
  • No error messages in logs

Phase 7: Service Validation

  • All expected containers running
  • Services responding on correct ports
  • Web interfaces accessible (if applicable)
  • APIs responding correctly (if applicable)
  • Health check endpoints passing (if configured)
  • Data persistence verified (check databases, files)
  • Inter-container communication working
  • External service connections working (databases, APIs)

Network & Connectivity Checklist

Phase 8: Network Validation

  • LXC has correct IP address: ip addr show
  • Gateway reachable: ping <gateway-ip>
  • Internal network access verified
  • Internet access confirmed
  • DNS resolution working for all required domains
  • Ports accessible from other hosts: nc -zv <lxc-ip> <port>
  • Firewall rules applied (if needed)

Phase 9: External Access

  • Service accessible from local network
  • Service accessible from internet (if required)
  • Reverse proxy updated (if using NPM/Traefik)
  • SSL certificates working (if HTTPS)
  • Domain names resolving correctly
  • Load balancer updated (if applicable)

Performance & Stability Checklist

Phase 10: Performance Validation

  • CPU usage reasonable: top or htop
  • Memory usage lower than VM: free -h
  • Disk I/O acceptable: iostat or monitor in Proxmox
  • Network throughput adequate: test with actual traffic
  • Response times equal to or better than VM
  • No performance degradation under load

Phase 11: Resource Monitoring (First 24 Hours)

  • Hour 1: Services stable, no crashes
  • Hour 2: Resource usage normal
  • Hour 4: No memory leaks detected
  • Hour 8: Performance consistent
  • Hour 24: All metrics stable
  • Proxmox graphs show healthy trends
  • No OOM (Out of Memory) kills: dmesg | grep -i oom
  • No kernel errors: dmesg | grep -i error

Phase 12: Functional Testing

  • Primary functionality tested end-to-end
  • User workflows validated
  • Scheduled jobs running (cron, etc.)
  • Backups configured and tested
  • Monitoring alerts configured
  • Logging working correctly
  • Integrations with other services functioning

Data Integrity Checklist

Phase 13: Data Validation

  • Database connections working
  • Data readable and writable
  • File uploads/downloads working
  • Cache functioning correctly
  • Sessions persisting correctly
  • User data accessible
  • No data corruption detected
  • Database migrations applied (if needed)

Phase 14: Backup Validation

  • Backup jobs configured for LXC
  • Test backup created successfully
  • Test restore validated
  • Backup storage sufficient
  • Backup retention policy set
  • Backup monitoring alerts configured

Extended Monitoring Checklist

Phase 15: Week 1 Monitoring

  • Day 1: Initial 24 hours stable
  • Day 2: Resource usage patterns established
  • Day 3: Performance benchmarks met
  • Day 4: No unexpected issues
  • Day 5: Load testing passed (if applicable)
  • Day 6: Weekend operations normal (if applicable)
  • Day 7: Weekly summary reviewed, all green

Phase 16: Week 2 Validation

  • Week 2: Continued stability
  • No memory leaks over extended period
  • Disk usage growth as expected
  • No unexpected restarts or crashes
  • Resource utilization optimized
  • Documentation updated with final configuration

Rollback Checklist (If Needed)

Emergency Rollback

  • Stop LXC container: pct stop <ctid>
  • Start original VM: qm start <vmid>
  • Verify VM services starting
  • Validate VM functionality
  • Restore network access (update DNS/proxy if changed)
  • Document rollback reason for analysis
  • Plan remediation before retry

Final Migration Completion Checklist

Phase 17: Production Validation

  • 1-2 weeks of stable operation confirmed
  • All stakeholders confirm service quality
  • Performance metrics meet or exceed VM baseline
  • No outstanding issues or concerns
  • Monitoring and alerting fully operational
  • Documentation complete and accurate

Phase 18: Cleanup

  • VM no longer needed, safe to remove
  • VM snapshot retained for safety (30 days recommended)
  • Original VM stopped and archived
  • Resources freed up (document savings)
  • Migration marked complete in tracking system
  • Lessons learned documented

Phase 19: Documentation Updates

  • Network diagram updated (if exists)
  • IP address spreadsheet updated
  • Service inventory updated
  • Runbooks updated for new LXC location
  • Backup documentation updated
  • Disaster recovery plan updated
  • Team knowledge base updated

Quick Reference: Common Issues & Solutions

Issue: Container won't start

Check:

  • Storage space available: pvesm status
  • Container configuration valid: pct config <ctid>
  • No resource limits exceeded
  • Logs: journalctl -u pve-container@<ctid>

Issue: Docker won't start

Check:

  • Nesting enabled: pct config <ctid> | grep features
  • Container is privileged: pct config <ctid> | grep unprivileged
  • Docker service: systemctl status docker
  • Logs: journalctl -u docker

Issue: Network not working

Check:

  • Network interface configured: ip addr show
  • Gateway configured: ip route show
  • DNS configured: cat /etc/resolv.conf
  • Firewall rules: iptables -L

Issue: Poor performance

Check:

  • Resource allocation sufficient: pct config <ctid>
  • No CPU throttling: cat /proc/loadavg
  • Memory not exhausted: free -h
  • No I/O bottleneck: iostat -x 1

Issue: Can't access services

Check:

  • Containers running: docker ps
  • Ports exposed: docker ps (PORTS column)
  • Firewall rules: iptables -L
  • Service binding: netstat -tlnp | grep <port>
  • Reverse proxy config updated

Service-Specific Checklists

Discord Bots

  • Bot token configured correctly
  • Bot connected to Discord: check bot status
  • Commands responding
  • Database connections working (if applicable)
  • Scheduled tasks running
  • Logs showing normal operation

Databases (PostgreSQL, MySQL, MongoDB)

  • Database service running
  • Data directory mounted correctly
  • Connections from applications working
  • Queries executing normally
  • Backups configured
  • Replication working (if applicable)
  • Performance acceptable: run query benchmarks

Plex Media Server

  • Media libraries accessible
  • Transcoding working (CPU or GPU)
  • Streaming playback smooth
  • Metadata refreshing
  • Remote access configured (if needed)
  • Hardware acceleration working (if configured)

Docker-Based Web Apps

  • Web interface accessible
  • Login/authentication working
  • Database connections functional
  • File uploads working
  • API endpoints responding
  • SSL/TLS certificates valid
  • Caching working correctly

Migration Success Criteria

Minimum Criteria (Must Have)

  • All services running and accessible
  • No data loss or corruption
  • Performance equal to or better than VM
  • 24 hours of stable operation
  • No critical errors in logs
  • Rollback plan tested and ready

Optimal Criteria (Should Have)

  • Resource usage reduced vs VM
  • Faster startup times
  • Improved I/O performance
  • 1 week of stable operation
  • Monitoring and alerts configured
  • Documentation complete

Excellence Criteria (Nice to Have)

  • 2 weeks of flawless operation
  • Measurable performance improvements
  • Resource optimization completed
  • Automated backups validated
  • Team trained on new setup
  • Migration lessons documented

Notes & Best Practices

Timing:

  • Migrate non-critical services first
  • Schedule during low-traffic periods
  • Allow extra time for first migration
  • Plan for 2-4 hours per service initially

Safety:

  • Always have VM snapshot before starting
  • Keep VM stopped but available for 1-2 weeks
  • Test rollback procedure before committing
  • Document every step for repeatability

Monitoring:

  • Watch resource usage closely first 48 hours
  • Set up alerts for anomalies
  • Compare to VM baseline metrics
  • Keep detailed migration notes

Optimization:

  • Start with conservative resource allocation
  • Tune after monitoring actual usage
  • Document optimal settings for future migrations
  • Share learnings with team

Checklist Version: 1.0 Last Updated: 2025-01-11 For: Cal's Home Lab Proxmox Infrastructure