Version control Claude Code configuration including: - Global instructions (CLAUDE.md) - User settings (settings.json) - Custom agents (architect, designer, engineer, etc.) - Custom skills (create-skill templates and workflows) Excludes session data, secrets, cache, and temporary files per .gitignore. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
384 lines
12 KiB
Markdown
384 lines
12 KiB
Markdown
# VM to LXC Migration Testing Checklist
|
|
|
|
Comprehensive validation checklist for VM to LXC container migrations.
|
|
|
|
## Pre-Migration Checklist
|
|
|
|
### Planning Phase
|
|
- [ ] VM analyzed with migration tool: `python3 migrate_vm_to_lxc.py analyze --vmid <id>`
|
|
- [ ] Migration suitability confirmed (excellent or good)
|
|
- [ ] Migration plan generated and reviewed
|
|
- [ ] Target LXC container ID selected (not in use)
|
|
- [ ] Static IP address planned (if needed)
|
|
- [ ] Maintenance window scheduled (low-traffic period)
|
|
- [ ] Stakeholders notified (if production service)
|
|
- [ ] Rollback plan documented and understood
|
|
|
|
### Backup Phase
|
|
- [ ] VM snapshot created: `snapshot-name: pre-migration-YYYY-MM-DD`
|
|
- [ ] VM snapshot verified in Proxmox UI
|
|
- [ ] Docker Compose files backed up from VM
|
|
- [ ] Docker volumes/data backed up (if applicable)
|
|
- [ ] List of running containers documented
|
|
- [ ] Environment variables documented
|
|
- [ ] Network configuration documented (IP, ports, DNS)
|
|
- [ ] External dependencies documented (databases, APIs, etc.)
|
|
|
|
### Infrastructure Validation
|
|
- [ ] Docker LXC template exists (ID 9001 or custom)
|
|
- [ ] Target container ID available
|
|
- [ ] Sufficient storage space on target storage pool
|
|
- [ ] Network configuration confirmed (VLAN, bridge, gateway)
|
|
- [ ] DNS entries documented (update after migration if needed)
|
|
- [ ] Firewall rules documented
|
|
- [ ] Reverse proxy configuration backed up (if using NPM/Traefik)
|
|
|
|
---
|
|
|
|
## Migration Execution Checklist
|
|
|
|
### Phase 1: Pre-Migration Testing
|
|
- [ ] VM is running and healthy
|
|
- [ ] All services responding normally
|
|
- [ ] No error logs in VM
|
|
- [ ] Docker containers all running: `docker ps -a`
|
|
- [ ] Resource usage documented (CPU, RAM, disk)
|
|
- [ ] Performance baseline captured (response times, etc.)
|
|
|
|
### Phase 2: VM Shutdown
|
|
- [ ] Services gracefully stopped (if order matters)
|
|
- [ ] Docker containers stopped: `docker compose down` (optional)
|
|
- [ ] VM shut down gracefully: `shutdown -h now` or Proxmox
|
|
- [ ] VM status confirmed: `stopped`
|
|
- [ ] Snapshot remains intact
|
|
|
|
### Phase 3: LXC Creation
|
|
- [ ] LXC created from template
|
|
- [ ] Container ID matches plan
|
|
- [ ] Hostname configured correctly
|
|
- [ ] Memory allocation set (estimated from analysis)
|
|
- [ ] CPU cores allocated (match or reduce from VM)
|
|
- [ ] Storage configured correctly
|
|
- [ ] Network configured (static IP or DHCP)
|
|
- [ ] Docker features enabled: `nesting=1,keyctl=1`
|
|
- [ ] Container set to privileged mode (unprivileged=0)
|
|
- [ ] Container configuration reviewed in Proxmox UI
|
|
|
|
### Phase 4: LXC Initial Start
|
|
- [ ] Container started successfully
|
|
- [ ] Container status: `running`
|
|
- [ ] Container accessible via SSH
|
|
- [ ] Network connectivity confirmed: `ping 8.8.8.8`
|
|
- [ ] DNS resolution working: `nslookup google.com`
|
|
- [ ] Docker service running: `systemctl status docker`
|
|
- [ ] Docker working: `docker ps` (should be empty initially)
|
|
|
|
---
|
|
|
|
## Service Migration Checklist
|
|
|
|
### Phase 5: Docker Configuration Transfer
|
|
- [ ] Docker Compose files copied to LXC
|
|
- [ ] Directory structure matches VM layout
|
|
- [ ] File permissions verified
|
|
- [ ] Environment files copied (.env files)
|
|
- [ ] Docker volumes path confirmed
|
|
- [ ] Data directories created (if needed)
|
|
- [ ] Configuration files reviewed for absolute paths
|
|
|
|
### Phase 6: Docker Containers Deployment
|
|
- [ ] Docker Compose files validated: `docker compose config`
|
|
- [ ] Images pulled successfully: `docker compose pull`
|
|
- [ ] Containers created: `docker compose up -d`
|
|
- [ ] All containers started: `docker compose ps`
|
|
- [ ] No container restart loops: `docker ps` (check STATUS)
|
|
- [ ] Container logs checked: `docker compose logs`
|
|
- [ ] No error messages in logs
|
|
|
|
### Phase 7: Service Validation
|
|
- [ ] All expected containers running
|
|
- [ ] Services responding on correct ports
|
|
- [ ] Web interfaces accessible (if applicable)
|
|
- [ ] APIs responding correctly (if applicable)
|
|
- [ ] Health check endpoints passing (if configured)
|
|
- [ ] Data persistence verified (check databases, files)
|
|
- [ ] Inter-container communication working
|
|
- [ ] External service connections working (databases, APIs)
|
|
|
|
---
|
|
|
|
## Network & Connectivity Checklist
|
|
|
|
### Phase 8: Network Validation
|
|
- [ ] LXC has correct IP address: `ip addr show`
|
|
- [ ] Gateway reachable: `ping <gateway-ip>`
|
|
- [ ] Internal network access verified
|
|
- [ ] Internet access confirmed
|
|
- [ ] DNS resolution working for all required domains
|
|
- [ ] Ports accessible from other hosts: `nc -zv <lxc-ip> <port>`
|
|
- [ ] Firewall rules applied (if needed)
|
|
|
|
### Phase 9: External Access
|
|
- [ ] Service accessible from local network
|
|
- [ ] Service accessible from internet (if required)
|
|
- [ ] Reverse proxy updated (if using NPM/Traefik)
|
|
- [ ] SSL certificates working (if HTTPS)
|
|
- [ ] Domain names resolving correctly
|
|
- [ ] Load balancer updated (if applicable)
|
|
|
|
---
|
|
|
|
## Performance & Stability Checklist
|
|
|
|
### Phase 10: Performance Validation
|
|
- [ ] CPU usage reasonable: `top` or `htop`
|
|
- [ ] Memory usage lower than VM: `free -h`
|
|
- [ ] Disk I/O acceptable: `iostat` or monitor in Proxmox
|
|
- [ ] Network throughput adequate: test with actual traffic
|
|
- [ ] Response times equal to or better than VM
|
|
- [ ] No performance degradation under load
|
|
|
|
### Phase 11: Resource Monitoring (First 24 Hours)
|
|
- [ ] Hour 1: Services stable, no crashes
|
|
- [ ] Hour 2: Resource usage normal
|
|
- [ ] Hour 4: No memory leaks detected
|
|
- [ ] Hour 8: Performance consistent
|
|
- [ ] Hour 24: All metrics stable
|
|
- [ ] Proxmox graphs show healthy trends
|
|
- [ ] No OOM (Out of Memory) kills: `dmesg | grep -i oom`
|
|
- [ ] No kernel errors: `dmesg | grep -i error`
|
|
|
|
### Phase 12: Functional Testing
|
|
- [ ] Primary functionality tested end-to-end
|
|
- [ ] User workflows validated
|
|
- [ ] Scheduled jobs running (cron, etc.)
|
|
- [ ] Backups configured and tested
|
|
- [ ] Monitoring alerts configured
|
|
- [ ] Logging working correctly
|
|
- [ ] Integrations with other services functioning
|
|
|
|
---
|
|
|
|
## Data Integrity Checklist
|
|
|
|
### Phase 13: Data Validation
|
|
- [ ] Database connections working
|
|
- [ ] Data readable and writable
|
|
- [ ] File uploads/downloads working
|
|
- [ ] Cache functioning correctly
|
|
- [ ] Sessions persisting correctly
|
|
- [ ] User data accessible
|
|
- [ ] No data corruption detected
|
|
- [ ] Database migrations applied (if needed)
|
|
|
|
### Phase 14: Backup Validation
|
|
- [ ] Backup jobs configured for LXC
|
|
- [ ] Test backup created successfully
|
|
- [ ] Test restore validated
|
|
- [ ] Backup storage sufficient
|
|
- [ ] Backup retention policy set
|
|
- [ ] Backup monitoring alerts configured
|
|
|
|
---
|
|
|
|
## Extended Monitoring Checklist
|
|
|
|
### Phase 15: Week 1 Monitoring
|
|
- [ ] Day 1: Initial 24 hours stable
|
|
- [ ] Day 2: Resource usage patterns established
|
|
- [ ] Day 3: Performance benchmarks met
|
|
- [ ] Day 4: No unexpected issues
|
|
- [ ] Day 5: Load testing passed (if applicable)
|
|
- [ ] Day 6: Weekend operations normal (if applicable)
|
|
- [ ] Day 7: Weekly summary reviewed, all green
|
|
|
|
### Phase 16: Week 2 Validation
|
|
- [ ] Week 2: Continued stability
|
|
- [ ] No memory leaks over extended period
|
|
- [ ] Disk usage growth as expected
|
|
- [ ] No unexpected restarts or crashes
|
|
- [ ] Resource utilization optimized
|
|
- [ ] Documentation updated with final configuration
|
|
|
|
---
|
|
|
|
## Rollback Checklist (If Needed)
|
|
|
|
### Emergency Rollback
|
|
- [ ] Stop LXC container: `pct stop <ctid>`
|
|
- [ ] Start original VM: `qm start <vmid>`
|
|
- [ ] Verify VM services starting
|
|
- [ ] Validate VM functionality
|
|
- [ ] Restore network access (update DNS/proxy if changed)
|
|
- [ ] Document rollback reason for analysis
|
|
- [ ] Plan remediation before retry
|
|
|
|
---
|
|
|
|
## Final Migration Completion Checklist
|
|
|
|
### Phase 17: Production Validation
|
|
- [ ] 1-2 weeks of stable operation confirmed
|
|
- [ ] All stakeholders confirm service quality
|
|
- [ ] Performance metrics meet or exceed VM baseline
|
|
- [ ] No outstanding issues or concerns
|
|
- [ ] Monitoring and alerting fully operational
|
|
- [ ] Documentation complete and accurate
|
|
|
|
### Phase 18: Cleanup
|
|
- [ ] VM no longer needed, safe to remove
|
|
- [ ] VM snapshot retained for safety (30 days recommended)
|
|
- [ ] Original VM stopped and archived
|
|
- [ ] Resources freed up (document savings)
|
|
- [ ] Migration marked complete in tracking system
|
|
- [ ] Lessons learned documented
|
|
|
|
### Phase 19: Documentation Updates
|
|
- [ ] Network diagram updated (if exists)
|
|
- [ ] IP address spreadsheet updated
|
|
- [ ] Service inventory updated
|
|
- [ ] Runbooks updated for new LXC location
|
|
- [ ] Backup documentation updated
|
|
- [ ] Disaster recovery plan updated
|
|
- [ ] Team knowledge base updated
|
|
|
|
---
|
|
|
|
## Quick Reference: Common Issues & Solutions
|
|
|
|
### Issue: Container won't start
|
|
**Check:**
|
|
- [ ] Storage space available: `pvesm status`
|
|
- [ ] Container configuration valid: `pct config <ctid>`
|
|
- [ ] No resource limits exceeded
|
|
- [ ] Logs: `journalctl -u pve-container@<ctid>`
|
|
|
|
### Issue: Docker won't start
|
|
**Check:**
|
|
- [ ] Nesting enabled: `pct config <ctid> | grep features`
|
|
- [ ] Container is privileged: `pct config <ctid> | grep unprivileged`
|
|
- [ ] Docker service: `systemctl status docker`
|
|
- [ ] Logs: `journalctl -u docker`
|
|
|
|
### Issue: Network not working
|
|
**Check:**
|
|
- [ ] Network interface configured: `ip addr show`
|
|
- [ ] Gateway configured: `ip route show`
|
|
- [ ] DNS configured: `cat /etc/resolv.conf`
|
|
- [ ] Firewall rules: `iptables -L`
|
|
|
|
### Issue: Poor performance
|
|
**Check:**
|
|
- [ ] Resource allocation sufficient: `pct config <ctid>`
|
|
- [ ] No CPU throttling: `cat /proc/loadavg`
|
|
- [ ] Memory not exhausted: `free -h`
|
|
- [ ] No I/O bottleneck: `iostat -x 1`
|
|
|
|
### Issue: Can't access services
|
|
**Check:**
|
|
- [ ] Containers running: `docker ps`
|
|
- [ ] Ports exposed: `docker ps` (PORTS column)
|
|
- [ ] Firewall rules: `iptables -L`
|
|
- [ ] Service binding: `netstat -tlnp | grep <port>`
|
|
- [ ] Reverse proxy config updated
|
|
|
|
---
|
|
|
|
## Service-Specific Checklists
|
|
|
|
### Discord Bots
|
|
- [ ] Bot token configured correctly
|
|
- [ ] Bot connected to Discord: check bot status
|
|
- [ ] Commands responding
|
|
- [ ] Database connections working (if applicable)
|
|
- [ ] Scheduled tasks running
|
|
- [ ] Logs showing normal operation
|
|
|
|
### Databases (PostgreSQL, MySQL, MongoDB)
|
|
- [ ] Database service running
|
|
- [ ] Data directory mounted correctly
|
|
- [ ] Connections from applications working
|
|
- [ ] Queries executing normally
|
|
- [ ] Backups configured
|
|
- [ ] Replication working (if applicable)
|
|
- [ ] Performance acceptable: run query benchmarks
|
|
|
|
### Plex Media Server
|
|
- [ ] Media libraries accessible
|
|
- [ ] Transcoding working (CPU or GPU)
|
|
- [ ] Streaming playback smooth
|
|
- [ ] Metadata refreshing
|
|
- [ ] Remote access configured (if needed)
|
|
- [ ] Hardware acceleration working (if configured)
|
|
|
|
### Docker-Based Web Apps
|
|
- [ ] Web interface accessible
|
|
- [ ] Login/authentication working
|
|
- [ ] Database connections functional
|
|
- [ ] File uploads working
|
|
- [ ] API endpoints responding
|
|
- [ ] SSL/TLS certificates valid
|
|
- [ ] Caching working correctly
|
|
|
|
---
|
|
|
|
## Migration Success Criteria
|
|
|
|
### Minimum Criteria (Must Have)
|
|
- ✅ All services running and accessible
|
|
- ✅ No data loss or corruption
|
|
- ✅ Performance equal to or better than VM
|
|
- ✅ 24 hours of stable operation
|
|
- ✅ No critical errors in logs
|
|
- ✅ Rollback plan tested and ready
|
|
|
|
### Optimal Criteria (Should Have)
|
|
- ✅ Resource usage reduced vs VM
|
|
- ✅ Faster startup times
|
|
- ✅ Improved I/O performance
|
|
- ✅ 1 week of stable operation
|
|
- ✅ Monitoring and alerts configured
|
|
- ✅ Documentation complete
|
|
|
|
### Excellence Criteria (Nice to Have)
|
|
- ✅ 2 weeks of flawless operation
|
|
- ✅ Measurable performance improvements
|
|
- ✅ Resource optimization completed
|
|
- ✅ Automated backups validated
|
|
- ✅ Team trained on new setup
|
|
- ✅ Migration lessons documented
|
|
|
|
---
|
|
|
|
## Notes & Best Practices
|
|
|
|
**Timing:**
|
|
- Migrate non-critical services first
|
|
- Schedule during low-traffic periods
|
|
- Allow extra time for first migration
|
|
- Plan for 2-4 hours per service initially
|
|
|
|
**Safety:**
|
|
- Always have VM snapshot before starting
|
|
- Keep VM stopped but available for 1-2 weeks
|
|
- Test rollback procedure before committing
|
|
- Document every step for repeatability
|
|
|
|
**Monitoring:**
|
|
- Watch resource usage closely first 48 hours
|
|
- Set up alerts for anomalies
|
|
- Compare to VM baseline metrics
|
|
- Keep detailed migration notes
|
|
|
|
**Optimization:**
|
|
- Start with conservative resource allocation
|
|
- Tune after monitoring actual usage
|
|
- Document optimal settings for future migrations
|
|
- Share learnings with team
|
|
|
|
---
|
|
|
|
**Checklist Version:** 1.0
|
|
**Last Updated:** 2025-01-11
|
|
**For:** Cal's Home Lab Proxmox Infrastructure
|