Version control Claude Code configuration including: - Global instructions (CLAUDE.md) - User settings (settings.json) - Custom agents (architect, designer, engineer, etc.) - Custom skills (create-skill templates and workflows) Excludes session data, secrets, cache, and temporary files per .gitignore. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
12 KiB
12 KiB
VM to LXC Migration Testing Checklist
Comprehensive validation checklist for VM to LXC container migrations.
Pre-Migration Checklist
Planning Phase
- VM analyzed with migration tool:
python3 migrate_vm_to_lxc.py analyze --vmid <id> - Migration suitability confirmed (excellent or good)
- Migration plan generated and reviewed
- Target LXC container ID selected (not in use)
- Static IP address planned (if needed)
- Maintenance window scheduled (low-traffic period)
- Stakeholders notified (if production service)
- Rollback plan documented and understood
Backup Phase
- VM snapshot created:
snapshot-name: pre-migration-YYYY-MM-DD - VM snapshot verified in Proxmox UI
- Docker Compose files backed up from VM
- Docker volumes/data backed up (if applicable)
- List of running containers documented
- Environment variables documented
- Network configuration documented (IP, ports, DNS)
- External dependencies documented (databases, APIs, etc.)
Infrastructure Validation
- Docker LXC template exists (ID 9001 or custom)
- Target container ID available
- Sufficient storage space on target storage pool
- Network configuration confirmed (VLAN, bridge, gateway)
- DNS entries documented (update after migration if needed)
- Firewall rules documented
- Reverse proxy configuration backed up (if using NPM/Traefik)
Migration Execution Checklist
Phase 1: Pre-Migration Testing
- VM is running and healthy
- All services responding normally
- No error logs in VM
- Docker containers all running:
docker ps -a - Resource usage documented (CPU, RAM, disk)
- Performance baseline captured (response times, etc.)
Phase 2: VM Shutdown
- Services gracefully stopped (if order matters)
- Docker containers stopped:
docker compose down(optional) - VM shut down gracefully:
shutdown -h nowor Proxmox - VM status confirmed:
stopped - Snapshot remains intact
Phase 3: LXC Creation
- LXC created from template
- Container ID matches plan
- Hostname configured correctly
- Memory allocation set (estimated from analysis)
- CPU cores allocated (match or reduce from VM)
- Storage configured correctly
- Network configured (static IP or DHCP)
- Docker features enabled:
nesting=1,keyctl=1 - Container set to privileged mode (unprivileged=0)
- Container configuration reviewed in Proxmox UI
Phase 4: LXC Initial Start
- Container started successfully
- Container status:
running - Container accessible via SSH
- Network connectivity confirmed:
ping 8.8.8.8 - DNS resolution working:
nslookup google.com - Docker service running:
systemctl status docker - Docker working:
docker ps(should be empty initially)
Service Migration Checklist
Phase 5: Docker Configuration Transfer
- Docker Compose files copied to LXC
- Directory structure matches VM layout
- File permissions verified
- Environment files copied (.env files)
- Docker volumes path confirmed
- Data directories created (if needed)
- Configuration files reviewed for absolute paths
Phase 6: Docker Containers Deployment
- Docker Compose files validated:
docker compose config - Images pulled successfully:
docker compose pull - Containers created:
docker compose up -d - All containers started:
docker compose ps - No container restart loops:
docker ps(check STATUS) - Container logs checked:
docker compose logs - No error messages in logs
Phase 7: Service Validation
- All expected containers running
- Services responding on correct ports
- Web interfaces accessible (if applicable)
- APIs responding correctly (if applicable)
- Health check endpoints passing (if configured)
- Data persistence verified (check databases, files)
- Inter-container communication working
- External service connections working (databases, APIs)
Network & Connectivity Checklist
Phase 8: Network Validation
- LXC has correct IP address:
ip addr show - Gateway reachable:
ping <gateway-ip> - Internal network access verified
- Internet access confirmed
- DNS resolution working for all required domains
- Ports accessible from other hosts:
nc -zv <lxc-ip> <port> - Firewall rules applied (if needed)
Phase 9: External Access
- Service accessible from local network
- Service accessible from internet (if required)
- Reverse proxy updated (if using NPM/Traefik)
- SSL certificates working (if HTTPS)
- Domain names resolving correctly
- Load balancer updated (if applicable)
Performance & Stability Checklist
Phase 10: Performance Validation
- CPU usage reasonable:
toporhtop - Memory usage lower than VM:
free -h - Disk I/O acceptable:
iostator monitor in Proxmox - Network throughput adequate: test with actual traffic
- Response times equal to or better than VM
- No performance degradation under load
Phase 11: Resource Monitoring (First 24 Hours)
- Hour 1: Services stable, no crashes
- Hour 2: Resource usage normal
- Hour 4: No memory leaks detected
- Hour 8: Performance consistent
- Hour 24: All metrics stable
- Proxmox graphs show healthy trends
- No OOM (Out of Memory) kills:
dmesg | grep -i oom - No kernel errors:
dmesg | grep -i error
Phase 12: Functional Testing
- Primary functionality tested end-to-end
- User workflows validated
- Scheduled jobs running (cron, etc.)
- Backups configured and tested
- Monitoring alerts configured
- Logging working correctly
- Integrations with other services functioning
Data Integrity Checklist
Phase 13: Data Validation
- Database connections working
- Data readable and writable
- File uploads/downloads working
- Cache functioning correctly
- Sessions persisting correctly
- User data accessible
- No data corruption detected
- Database migrations applied (if needed)
Phase 14: Backup Validation
- Backup jobs configured for LXC
- Test backup created successfully
- Test restore validated
- Backup storage sufficient
- Backup retention policy set
- Backup monitoring alerts configured
Extended Monitoring Checklist
Phase 15: Week 1 Monitoring
- Day 1: Initial 24 hours stable
- Day 2: Resource usage patterns established
- Day 3: Performance benchmarks met
- Day 4: No unexpected issues
- Day 5: Load testing passed (if applicable)
- Day 6: Weekend operations normal (if applicable)
- Day 7: Weekly summary reviewed, all green
Phase 16: Week 2 Validation
- Week 2: Continued stability
- No memory leaks over extended period
- Disk usage growth as expected
- No unexpected restarts or crashes
- Resource utilization optimized
- Documentation updated with final configuration
Rollback Checklist (If Needed)
Emergency Rollback
- Stop LXC container:
pct stop <ctid> - Start original VM:
qm start <vmid> - Verify VM services starting
- Validate VM functionality
- Restore network access (update DNS/proxy if changed)
- Document rollback reason for analysis
- Plan remediation before retry
Final Migration Completion Checklist
Phase 17: Production Validation
- 1-2 weeks of stable operation confirmed
- All stakeholders confirm service quality
- Performance metrics meet or exceed VM baseline
- No outstanding issues or concerns
- Monitoring and alerting fully operational
- Documentation complete and accurate
Phase 18: Cleanup
- VM no longer needed, safe to remove
- VM snapshot retained for safety (30 days recommended)
- Original VM stopped and archived
- Resources freed up (document savings)
- Migration marked complete in tracking system
- Lessons learned documented
Phase 19: Documentation Updates
- Network diagram updated (if exists)
- IP address spreadsheet updated
- Service inventory updated
- Runbooks updated for new LXC location
- Backup documentation updated
- Disaster recovery plan updated
- Team knowledge base updated
Quick Reference: Common Issues & Solutions
Issue: Container won't start
Check:
- Storage space available:
pvesm status - Container configuration valid:
pct config <ctid> - No resource limits exceeded
- Logs:
journalctl -u pve-container@<ctid>
Issue: Docker won't start
Check:
- Nesting enabled:
pct config <ctid> | grep features - Container is privileged:
pct config <ctid> | grep unprivileged - Docker service:
systemctl status docker - Logs:
journalctl -u docker
Issue: Network not working
Check:
- Network interface configured:
ip addr show - Gateway configured:
ip route show - DNS configured:
cat /etc/resolv.conf - Firewall rules:
iptables -L
Issue: Poor performance
Check:
- Resource allocation sufficient:
pct config <ctid> - No CPU throttling:
cat /proc/loadavg - Memory not exhausted:
free -h - No I/O bottleneck:
iostat -x 1
Issue: Can't access services
Check:
- Containers running:
docker ps - Ports exposed:
docker ps(PORTS column) - Firewall rules:
iptables -L - Service binding:
netstat -tlnp | grep <port> - Reverse proxy config updated
Service-Specific Checklists
Discord Bots
- Bot token configured correctly
- Bot connected to Discord: check bot status
- Commands responding
- Database connections working (if applicable)
- Scheduled tasks running
- Logs showing normal operation
Databases (PostgreSQL, MySQL, MongoDB)
- Database service running
- Data directory mounted correctly
- Connections from applications working
- Queries executing normally
- Backups configured
- Replication working (if applicable)
- Performance acceptable: run query benchmarks
Plex Media Server
- Media libraries accessible
- Transcoding working (CPU or GPU)
- Streaming playback smooth
- Metadata refreshing
- Remote access configured (if needed)
- Hardware acceleration working (if configured)
Docker-Based Web Apps
- Web interface accessible
- Login/authentication working
- Database connections functional
- File uploads working
- API endpoints responding
- SSL/TLS certificates valid
- Caching working correctly
Migration Success Criteria
Minimum Criteria (Must Have)
- ✅ All services running and accessible
- ✅ No data loss or corruption
- ✅ Performance equal to or better than VM
- ✅ 24 hours of stable operation
- ✅ No critical errors in logs
- ✅ Rollback plan tested and ready
Optimal Criteria (Should Have)
- ✅ Resource usage reduced vs VM
- ✅ Faster startup times
- ✅ Improved I/O performance
- ✅ 1 week of stable operation
- ✅ Monitoring and alerts configured
- ✅ Documentation complete
Excellence Criteria (Nice to Have)
- ✅ 2 weeks of flawless operation
- ✅ Measurable performance improvements
- ✅ Resource optimization completed
- ✅ Automated backups validated
- ✅ Team trained on new setup
- ✅ Migration lessons documented
Notes & Best Practices
Timing:
- Migrate non-critical services first
- Schedule during low-traffic periods
- Allow extra time for first migration
- Plan for 2-4 hours per service initially
Safety:
- Always have VM snapshot before starting
- Keep VM stopped but available for 1-2 weeks
- Test rollback procedure before committing
- Document every step for repeatability
Monitoring:
- Watch resource usage closely first 48 hours
- Set up alerts for anomalies
- Compare to VM baseline metrics
- Keep detailed migration notes
Optimization:
- Start with conservative resource allocation
- Tune after monitoring actual usage
- Document optimal settings for future migrations
- Share learnings with team
Checklist Version: 1.0 Last Updated: 2025-01-11 For: Cal's Home Lab Proxmox Infrastructure