- Add LXC migration plan and quick-start guide - Add wave 1 and wave 2 migration results - Add lxc-docker-create.sh for container creation - Add fix-docker-apparmor.sh for AppArmor issues - Add comprehensive LXC migration guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
822 lines
27 KiB
Markdown
822 lines
27 KiB
Markdown
# VM to LXC Migration Plan - Proxmox Infrastructure
|
||
|
||
**Created**: 2025-01-12
|
||
**Status**: ✅ Wave 2 Complete - In Progress
|
||
**Owner**: Cal Corum
|
||
**Last Updated**: 2025-12-05
|
||
|
||
## 🎯 Wave 1 Status: ✅ **COMPLETE**
|
||
- **VM 111 (docker-7days)** → **LXC 211** ✅ Successful
|
||
- **Migration Date**: 2025-01-12
|
||
- **Container Status**: Running and validated
|
||
- **Detailed Results**: See `wave1-migration-results.md`
|
||
|
||
## 🎯 Wave 2 Status: ✅ **COMPLETE**
|
||
- **VM 121 (docker-vpn)** → **LXC 221 (arr-stack)** ✅ Successful
|
||
- **Migration Date**: 2025-12-05
|
||
- **Container Status**: Running and validated
|
||
- **Key Changes**:
|
||
- Eliminated Mullvad VPN (Usenet + SSL is sufficient, no torrents)
|
||
- Replaced Overseerr with Jellyseerr (native Jellyfin support)
|
||
- Simplified stack: Sonarr, Radarr, Readarr, Jellyseerr, SABnzbd
|
||
- **Detailed Results**: See `wave2-migration-results.md`
|
||
|
||
## ✅ Confirmed Decisions
|
||
- **Networking**: Reuse existing IP addresses (transparent migration)
|
||
- **Storage**: Fresh install + volume copy for all Docker hosts
|
||
- **Timeline**: 4-6 weeks (updated from initial 6-8 based on Wave 1 experience)
|
||
- **GPU Services**: No GPU hardware available - Plex (107) and Tdarr (113) can migrate without special considerations
|
||
- **AppArmor Fix**: ALL docker-compose files need `security_opt: [apparmor=unconfined]` ⚠️ CRITICAL
|
||
|
||
## Executive Summary
|
||
|
||
Migrating services from full VMs to LXC containers on Proxmox to:
|
||
- Reduce resource overhead (memory, CPU, storage)
|
||
- Improve density and efficiency
|
||
- Faster provisioning and backup/restore
|
||
- Lower management complexity
|
||
|
||
**Current State**: 16 VMs (9 running, 7 stopped)
|
||
**Target State**: Strategic mix of LXC containers and VMs based on workload requirements
|
||
|
||
---
|
||
|
||
## Phase 1: Assessment & Categorization
|
||
|
||
### Current VM Inventory Analysis
|
||
|
||
#### Running Production VMs (9)
|
||
| VMID | Name | Service Type | Migration Candidate? | Priority | Notes |
|
||
|------|------|--------------|---------------------|----------|-------|
|
||
| 105 | docker-vpn | Docker Host | ✅ YES | HIGH | VPN routing considerations |
|
||
| 106 | docker-home | Docker Host | ✅ YES | HIGH | Critical home services |
|
||
| 107 | plex | Media Server | ✅ YES | MEDIUM | Software transcoding (no GPU hardware) |
|
||
| 109 | hass-io | Home Assistant | ❌ NO | N/A | HassOS requires VM, not standard Linux |
|
||
| 110 | discord-bots | Application | ✅ YES | MEDIUM | Simple Python services |
|
||
| 111 | docker-7days | Game Server | ✅ YES | HIGHEST | Lowest risk - migrate first |
|
||
| 112 | databases-bots | Database | ✅ YES | HIGH | PostgreSQL/databases |
|
||
| 113 | docker-tdarr | Transcode | ✅ YES | MEDIUM | Software transcoding (no GPU hardware) |
|
||
| 114 | docker-pittsburgh | Docker Host | ✅ YES | MEDIUM | Regional services |
|
||
| 115 | docker-sba | Docker Host | ✅ YES | MEDIUM | SBA baseball services |
|
||
| 116 | docker-home-servers | Docker Host | ✅ YES | HIGH | Critical infrastructure |
|
||
|
||
#### Stopped/Template VMs (7)
|
||
| VMID | Name | Purpose | Action |
|
||
|------|------|---------|--------|
|
||
| 100 | ubuntu-template | Template | KEEP as VM for flexibility |
|
||
| 101 | 7d-solo | Game Server | EVALUATE when needed |
|
||
| 102 | 7d-staci | Game Server | EVALUATE when needed |
|
||
| 103 | docker-template | Template | CONVERT to LXC template |
|
||
| 104 | 7d-wotw | Game Server | EVALUATE when needed |
|
||
| 117 | docker-unused | Unused | DELETE or ARCHIVE |
|
||
|
||
### Migration Suitability Matrix
|
||
|
||
#### ✅ **IDEAL for LXC** (All Migrate)
|
||
- **Game server - docker-7days (111)**: LOWEST RISK - Migrate first to validate process
|
||
- **Docker hosts** (105, 106, 114, 115, 116): Standard Docker workloads without special hardware
|
||
- **Application servers** (110): Discord bots, Python services
|
||
- **Database servers** (112): PostgreSQL, Redis, standard databases
|
||
- **Media servers** (107, 113): Plex and Tdarr using software transcoding (no GPU available)
|
||
- **Stopped game servers** (101, 102, 104): Migrate when needed
|
||
- **Docker template** (103): Convert to LXC template for faster provisioning
|
||
|
||
**Why**: No GPU hardware in system - all services can run in LXC without special considerations. Pure Linux workloads benefit from reduced overhead.
|
||
|
||
#### ❌ **KEEP as VM** (Do Not Migrate)
|
||
- **Home Assistant (109)**: HassOS is VM-optimized, not standard Linux
|
||
- **Ubuntu template (100)**: Keep VM flexibility for future VM deployments
|
||
|
||
**Why**: Technical incompatibility or strategic value as VM
|
||
|
||
---
|
||
|
||
## Phase 2: Technical Planning
|
||
|
||
### Service Consolidation Decision Framework
|
||
|
||
When deciding whether to keep services in separate LXCs or consolidate into a single LXC:
|
||
|
||
#### **Keep Separate** (1 LXC per service) when:
|
||
| Factor | Reason |
|
||
|--------|--------|
|
||
| **Blast radius** | Failure of one shouldn't take down others |
|
||
| **Different update cycles** | Services need independent maintenance windows |
|
||
| **Resource contention** | CPU/memory-hungry services that compete |
|
||
| **Security boundaries** | Different trust levels or network access needs |
|
||
| **Different owners/teams** | Separate accountability |
|
||
| **Databases** | Always isolate for backup/restore simplicity |
|
||
| **Critical infrastructure** | VPN, DNS, reverse proxy - high availability needs |
|
||
|
||
#### **Consolidate** (multiple services in 1 LXC) when:
|
||
| Factor | Reason |
|
||
|--------|--------|
|
||
| **Related services** | Naturally belong together (e.g., all SBA services) |
|
||
| **Low resource usage** | Services that barely use resources individually |
|
||
| **Same lifecycle** | Updated/restarted together anyway |
|
||
| **Shared dependencies** | Same database, same configs |
|
||
| **Simplicity wins** | Fewer LXCs to manage, backup, monitor |
|
||
| **Same project** | Discord bots for same league, microservices for same app |
|
||
|
||
#### Practical Examples:
|
||
|
||
| Keep Separate | Why |
|
||
|---------------|-----|
|
||
| Databases (112) | Backup/restore, data integrity |
|
||
| VPN (105) | Security boundary, networking critical |
|
||
| Critical home services (106) | High availability |
|
||
| n8n (210) | Workflow automation, independent maintenance |
|
||
|
||
| Candidate for Consolidation | Why |
|
||
|-----------------------------|-----|
|
||
| Discord bots + related API services | Same project, low resources, same maintainer |
|
||
| Multiple low-traffic web apps | Minimal resource usage |
|
||
| Dev/test environments | Non-critical, shared lifecycle |
|
||
|
||
---
|
||
|
||
### LXC vs VM Decision Criteria
|
||
|
||
| Criteria | LXC Container | Full VM | Notes |
|
||
|----------|--------------|---------|-------|
|
||
| **OS Type** | Linux only | Any OS | LXC shares host kernel |
|
||
| **Resource Overhead** | Minimal (~50-200MB RAM) | High (full OS stack) | LXC 5-10x more efficient |
|
||
| **Boot Time** | 1-5 seconds | 30-90 seconds | Near-instant container start |
|
||
| **Kernel Modules** | Shared host kernel | Own kernel | LXC cannot load custom modules |
|
||
| **Hardware Passthrough** | Limited (requires privileges) | Full passthrough | GPU/USB may need testing |
|
||
| **Nested Virtualization** | Not supported | Supported | Cannot run Docker-in-Docker easily |
|
||
| **Backup/Restore** | Very fast | Slower | Container backups are incremental |
|
||
| **Disk Performance** | Native | Near-native | Both excellent on modern storage |
|
||
|
||
### Key Technical Decisions
|
||
|
||
#### 1. **Networking Strategy** ✅ CONFIRMED
|
||
**Decision**: Reuse existing IP addresses
|
||
|
||
**Implementation**:
|
||
- ✅ No DNS changes required
|
||
- ✅ Existing firewall rules work
|
||
- ✅ Monitoring continues without changes
|
||
- ✅ Transparent migration for users
|
||
- ⚠️ Requires careful IP conflict management during parallel running
|
||
|
||
**Migration Process**:
|
||
1. Build LXC with temporary IP (or offline)
|
||
2. Test and validate LXC functionality
|
||
3. Stop VM during maintenance window
|
||
4. Reconfigure LXC to production IP
|
||
5. Start LXC and validate
|
||
6. Keep VM stopped for 48hr rollback window
|
||
|
||
#### 2. **Storage Strategy** ✅ CONFIRMED
|
||
**Decision**: Fresh install + volume copy for all Docker hosts
|
||
|
||
**Implementation for Docker Hosts**:
|
||
1. **Fresh LXC installation**:
|
||
- Clean Ubuntu 22.04 LTS base
|
||
- Install Docker via standard script
|
||
- Install docker-compose plugin
|
||
- No migration of system configs
|
||
|
||
2. **Volume migration**:
|
||
- Copy `/var/lib/docker/volumes/` from VM to LXC
|
||
- Copy docker-compose files from VM to LXC
|
||
- Copy environment files (.env) if applicable
|
||
- Validate volume data integrity
|
||
|
||
**Benefits**:
|
||
- ✅ Clean configuration, no cruft
|
||
- ✅ Opportunity to update/standardize configs
|
||
- ✅ Smaller container images
|
||
- ✅ Document infrastructure-as-code
|
||
- ✅ Latest Docker version on fresh install
|
||
|
||
#### 3. **Docker in LXC** ✅ CONFIRMED
|
||
**Decision**: Privileged LXC containers for all Docker hosts
|
||
|
||
**Configuration**:
|
||
- Set `--unprivileged 0` (privileged mode)
|
||
- Enable nesting: `--features nesting=1,keyctl=1`
|
||
- Docker works without issues
|
||
- All Docker features supported
|
||
- No complex UID mapping required
|
||
|
||
**Rationale**:
|
||
- ✅ Docker compatibility guaranteed
|
||
- ✅ Simpler configuration and troubleshooting
|
||
- ✅ Balanced approach for home lab environment
|
||
- ⚠️ Acceptable security trade-off for isolated home network
|
||
|
||
---
|
||
|
||
## Phase 3: Migration Strategy
|
||
|
||
### Phased Rollout Approach (Risk-Based Ordering)
|
||
|
||
#### **Wave 1: Lowest Risk - Game Server** (Week 1)
|
||
**Target**: Lowest-risk service to validate entire migration process
|
||
|
||
1. **docker-7days (111)** - Game server via Docker, lowest impact if issues occur
|
||
|
||
**Why This First**:
|
||
- ✅ Non-critical service (gaming only)
|
||
- ✅ Can migrate during off-hours when not in use
|
||
- ✅ Clear validation criteria (game server starts and runs)
|
||
- ✅ Builds confidence in process with minimal risk
|
||
- ✅ Tests Docker-in-LXC configuration end-to-end
|
||
|
||
**Success Criteria**:
|
||
- Game server accessible and playable
|
||
- Docker containers running stable for 48+ hours
|
||
- Backup/restore tested successfully
|
||
- Rollback procedure validated
|
||
- Process documented for next waves
|
||
|
||
#### **Wave 2: Docker Hosts - Regional/Isolated** (Week 1-2)
|
||
**Target**: Docker hosts with lower criticality and good isolation
|
||
|
||
2. **docker-pittsburgh (114)** - Regional services, lower criticality
|
||
3. **docker-vpn (105)** - VPN routing (isolated workload)
|
||
|
||
**Prerequisites**:
|
||
- Wave 1 successful (docker-7days stable)
|
||
- Process refined based on learnings
|
||
- Confidence in Docker-in-LXC configuration
|
||
|
||
**Validation Points**:
|
||
- VPN routing works correctly (105)
|
||
- Regional services accessible (114)
|
||
- No cross-service impact
|
||
|
||
#### **Wave 3: Additional Docker Hosts** (Week 2-3)
|
||
**Target**: More Docker infrastructure, increasing criticality
|
||
|
||
4. **docker-sba (115)** - Baseball services (defined maintenance windows)
|
||
5. **docker-unused (117)** - Migrate or decommission
|
||
6. **docker-home-servers (116)** - Home server infrastructure
|
||
|
||
**Critical Considerations**:
|
||
- SBA has known maintenance windows - use those
|
||
- docker-home-servers may have dependencies - validate carefully
|
||
- docker-unused can be decommissioned if no longer needed
|
||
|
||
#### **Wave 4: Application & Database Servers** (Week 3-4)
|
||
**Target**: Non-Docker services requiring extra care
|
||
|
||
7. **discord-bots (110)** - Python services, straightforward
|
||
8. **databases-bots (112)** - PostgreSQL/databases (highest care required)
|
||
|
||
**Critical Steps for Databases**:
|
||
- ⚠️ Full database backup before migration
|
||
- ⚠️ Validate connection strings from all dependent services
|
||
- ⚠️ Test database performance in LXC thoroughly
|
||
- ⚠️ Monitor for 48+ hours before decommissioning VM
|
||
- ⚠️ Have rollback plan ready and tested
|
||
|
||
#### **Wave 5: Media Services** ~~(Week 4-5)~~ **SKIPPED**
|
||
**Status**: ❌ SKIPPED - Services retired or decommissioned
|
||
|
||
~~9. **docker-tdarr (113)**~~ - **RETIRED**: Tdarr moved to dedicated GPU server (ubuntu-manticore)
|
||
~~10. **plex (107)**~~ - **DECOMMISSIONING**: Plex being retired, no migration needed
|
||
|
||
**Notes**:
|
||
- Tdarr now runs on ubuntu-manticore (10.10.0.226) with GPU transcoding
|
||
- Plex scheduled for decommission - Jellyfin is the replacement
|
||
|
||
#### **Wave 6: Final Critical Infrastructure** (Week 5-6)
|
||
**Target**: Most critical Docker infrastructure (save for last)
|
||
|
||
11. **docker-home (106)** - Critical home services (highest risk)
|
||
|
||
**Why Last**:
|
||
- Most critical infrastructure
|
||
- All other waves provide confidence
|
||
- Process fully refined and validated
|
||
- All potential issues already encountered and resolved
|
||
|
||
**Do NOT Migrate**:
|
||
- **hass-io (109)** - Keep as VM (HassOS requirement)
|
||
- **ubuntu-template (100)** - Keep as VM (strategic flexibility)
|
||
|
||
### Parallel Running Strategy
|
||
|
||
**For Each Migration**:
|
||
|
||
1. **Build LXC container** (new ID, temporary IP or offline)
|
||
2. **Configure and test** (validate all functionality)
|
||
3. **Sync data** from VM to LXC (while VM still running)
|
||
4. **Maintenance window**:
|
||
- Stop VM
|
||
- Final data sync
|
||
- Change LXC to production IP
|
||
- Start LXC
|
||
- Validate services
|
||
5. **Monitor for 24-48 hours** (VM kept in stopped state)
|
||
6. **Decommission VM** after confidence period
|
||
|
||
**Rollback Procedure**:
|
||
- Stop LXC
|
||
- Start VM (already has data up to cutover point)
|
||
- Resume production on VM
|
||
- Document what failed for retry
|
||
|
||
---
|
||
|
||
## Phase 4: Implementation Checklist
|
||
|
||
### Pre-Migration (Per Service)
|
||
|
||
- [ ] Document current VM configuration
|
||
- [ ] CPU, memory, storage allocation
|
||
- [ ] Network configuration (IP, gateway, DNS)
|
||
- [ ] Installed packages and services
|
||
- [ ] Docker compose files (if Docker host)
|
||
- [ ] Volume mounts and storage locations
|
||
- [ ] Environment variables and secrets
|
||
- [ ] Cron jobs and systemd services
|
||
|
||
- [ ] Create LXC container
|
||
- [ ] Select appropriate template (Ubuntu 22.04 LTS recommended)
|
||
- [ ] Allocate resources (start conservative, can increase)
|
||
- [ ] Configure networking (temporary IP for testing)
|
||
- [ ] Set privileged mode if Docker host
|
||
- [ ] Configure storage (bind mounts for data volumes)
|
||
|
||
- [ ] Prepare migration scripts
|
||
- [ ] Data sync script (rsync-based)
|
||
- [ ] Configuration export/import
|
||
- [ ] Service validation tests
|
||
|
||
- [ ] Backup current VM
|
||
- [ ] Full VM backup in Proxmox
|
||
- [ ] Export critical data separately
|
||
- [ ] Document backup location and restore procedure
|
||
|
||
### During Migration
|
||
|
||
- [ ] Announce maintenance window (if user-facing)
|
||
- [ ] Stop services on VM (or entire VM)
|
||
- [ ] Perform final data sync to LXC
|
||
- [ ] Update DNS/networking (if using new IP temporarily)
|
||
- [ ] Start services in LXC
|
||
- [ ] Run validation tests
|
||
- [ ] Service responding?
|
||
- [ ] Data accessible?
|
||
- [ ] External connectivity working?
|
||
- [ ] Dependent services connecting successfully?
|
||
- [ ] Performance acceptable?
|
||
|
||
### Post-Migration
|
||
|
||
- [ ] Monitor for 24 hours
|
||
- [ ] Check logs for errors
|
||
- [ ] Monitor resource usage
|
||
- [ ] Validate backups working
|
||
- [ ] Test restore procedure
|
||
|
||
- [ ] Update documentation
|
||
- [ ] Update VM inventory
|
||
- [ ] Document new container configuration
|
||
- [ ] Update monitoring configs
|
||
- [ ] Update runbooks/procedures
|
||
|
||
- [ ] After 48-hour success period
|
||
- [ ] Backup LXC container
|
||
- [ ] Delete VM backup (or archive)
|
||
- [ ] Destroy original VM
|
||
- [ ] Update network documentation
|
||
|
||
---
|
||
|
||
## Phase 5: Technical Implementation Details
|
||
|
||
### Standard LXC Container Creation
|
||
|
||
```bash
|
||
# Create privileged LXC container for Docker host
|
||
pct create 205 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||
--hostname docker-home-lxc \
|
||
--memory 4096 \
|
||
--cores 2 \
|
||
--net0 name=eth0,bridge=vmbr0,ip=10.10.0.106/24,gw=10.10.0.1 \
|
||
--storage local-lvm \
|
||
--rootfs local-lvm:32 \
|
||
--unprivileged 0 \
|
||
--features nesting=1,keyctl=1
|
||
|
||
# Start container
|
||
pct start 205
|
||
|
||
# Enter container
|
||
pct enter 205
|
||
```
|
||
|
||
### Docker Installation in LXC
|
||
|
||
```bash
|
||
# Inside LXC container
|
||
# Update system
|
||
apt update && apt upgrade -y
|
||
|
||
# Install Docker
|
||
curl -fsSL https://get.docker.com -o get-docker.sh
|
||
sh get-docker.sh
|
||
|
||
# Install Docker Compose
|
||
apt install docker-compose-plugin -y
|
||
|
||
# Verify
|
||
docker --version
|
||
docker compose version
|
||
```
|
||
|
||
### Data Migration Script Template
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# migrate-docker-host.sh
|
||
|
||
VM_IP="10.10.0.106"
|
||
LXC_IP="10.10.0.206" # Temporary during migration
|
||
VM_DATA="/var/lib/docker"
|
||
LXC_DATA="/var/lib/docker"
|
||
|
||
# Sync Docker volumes (while VM still running for initial sync)
|
||
rsync -avz --progress \
|
||
root@${VM_IP}:${VM_DATA}/ \
|
||
root@${LXC_IP}:${LXC_DATA}/
|
||
|
||
# Sync docker-compose files
|
||
rsync -avz --progress \
|
||
root@${VM_IP}:/opt/docker/ \
|
||
root@${LXC_IP}:/opt/docker/
|
||
|
||
# Sync environment files
|
||
rsync -avz --progress \
|
||
root@${VM_IP}:/root/.env \
|
||
root@${LXC_IP}:/root/.env
|
||
|
||
echo "Initial sync complete. Ready for cutover."
|
||
```
|
||
|
||
### Service Validation Script
|
||
|
||
```bash
|
||
#!/bin/bash
|
||
# validate-migration.sh
|
||
|
||
CONTAINER_IP="$1"
|
||
SERVICE_TYPE="$2"
|
||
|
||
echo "Validating migration for ${SERVICE_TYPE} at ${CONTAINER_IP}..."
|
||
|
||
case $SERVICE_TYPE in
|
||
docker)
|
||
# Check Docker is running
|
||
ssh root@${CONTAINER_IP} "docker ps" || exit 1
|
||
|
||
# Check compose services
|
||
ssh root@${CONTAINER_IP} "cd /opt/docker && docker compose ps" || exit 1
|
||
|
||
echo "✅ Docker services validated"
|
||
;;
|
||
|
||
database)
|
||
# Check PostgreSQL
|
||
ssh root@${CONTAINER_IP} "systemctl status postgresql" || exit 1
|
||
|
||
# Test connection
|
||
ssh root@${CONTAINER_IP} "sudo -u postgres psql -c 'SELECT version();'" || exit 1
|
||
|
||
echo "✅ Database validated"
|
||
;;
|
||
|
||
web)
|
||
# Check HTTP response
|
||
curl -f http://${CONTAINER_IP} || exit 1
|
||
|
||
echo "✅ Web service validated"
|
||
;;
|
||
esac
|
||
|
||
echo "✅ All validation checks passed!"
|
||
```
|
||
|
||
---
|
||
|
||
## Phase 6: Risk Management
|
||
|
||
### Risk Assessment
|
||
|
||
| Risk | Likelihood | Impact | Mitigation |
|
||
|------|-----------|--------|------------|
|
||
| Service downtime during migration | HIGH | MEDIUM | Off-hours migration, parallel running, fast rollback |
|
||
| Data loss during sync | LOW | HIGH | Multiple backups, checksums, validation |
|
||
| GPU passthrough failure | MEDIUM | MEDIUM | Test first, keep VMs as fallback |
|
||
| Performance degradation | LOW | MEDIUM | Monitor closely, can revert easily |
|
||
| Networking issues | MEDIUM | HIGH | Keep VM stopped but intact for rollback |
|
||
| Forgotten dependencies | MEDIUM | HIGH | Document thoroughly, test before cutover |
|
||
|
||
### Rollback Procedures
|
||
|
||
#### Quick Rollback (During Cutover)
|
||
```bash
|
||
# If migration fails during cutover window
|
||
pct stop 205 # Stop new LXC
|
||
qm start 106 # Start original VM
|
||
# Service restored in <2 minutes
|
||
```
|
||
|
||
#### Rollback After Migration
|
||
```bash
|
||
# If issues discovered post-migration
|
||
pct stop 205 # Stop LXC
|
||
qm start 106 # Start original VM
|
||
qm restore 106 backup-file.vma.zst # If needed
|
||
# May need to sync recent data from LXC to VM
|
||
```
|
||
|
||
### Success Metrics
|
||
|
||
**Per-Service Success Criteria**:
|
||
- Service uptime: 99.9% after 48 hours
|
||
- Response time: Same or better than VM
|
||
- Resource usage: 30-50% reduction in RAM usage
|
||
- No errors in logs
|
||
- Backups completing successfully
|
||
- Dependent services connecting properly
|
||
|
||
**Overall Migration Success**:
|
||
- 80%+ of suitable VMs migrated to LXC
|
||
- Zero data loss incidents
|
||
- Total downtime <4 hours across all migrations
|
||
- Documentation complete and validated
|
||
- Team confident in managing LXC infrastructure
|
||
|
||
---
|
||
|
||
## Phase 7: Resource Planning
|
||
|
||
### Expected Resource Gains
|
||
|
||
**Current VM Resource Usage** (estimated):
|
||
- 9 running VMs × 2GB average overhead = ~18GB RAM overhead
|
||
- 9 running VMs × 500MB average storage overhead = ~4.5GB storage
|
||
|
||
**Post-Migration LXC Resource Usage** (estimated):
|
||
- 7-8 LXC containers × 100MB average overhead = ~800MB RAM overhead
|
||
- 7-8 LXC containers × 100MB average storage overhead = ~800MB storage
|
||
|
||
**Net Gain**:
|
||
- ~17GB RAM freed (can support 17 more LXC containers or larger workloads)
|
||
- ~3.7GB storage freed
|
||
- Faster backup/restore times (5-10x improvement)
|
||
- Faster provisioning (minutes vs hours)
|
||
|
||
### Resource Allocation Strategy
|
||
|
||
**Conservative Approach** (Recommended for initial migration):
|
||
- Allocate **same resources as VM** to LXC initially
|
||
- Monitor usage for 1-2 weeks
|
||
- Right-size after baseline established
|
||
- Iterate and optimize
|
||
|
||
**Example**: VM with 4GB RAM, 2 cores
|
||
- LXC Initial: 4GB RAM, 2 cores
|
||
- After monitoring: Adjust to 2GB RAM, 2 cores (if appropriate)
|
||
- Freed resources: 2GB RAM for other uses
|
||
|
||
---
|
||
|
||
## Phase 8: Documentation & Knowledge Transfer
|
||
|
||
### Required Documentation Updates
|
||
|
||
- [ ] **VM Inventory** → **LXC Inventory**
|
||
- Update VMID mappings
|
||
- Update IP addresses (if changed)
|
||
- Update resource allocations
|
||
|
||
- [ ] **Runbooks**
|
||
- Update operational procedures for LXC
|
||
- Document `pct` commands vs `qm` commands
|
||
- Update backup/restore procedures
|
||
|
||
- [ ] **Monitoring**
|
||
- Update monitoring configs for LXC IDs
|
||
- Verify alerts still firing correctly
|
||
- Update dashboards
|
||
|
||
- [ ] **Troubleshooting Guide**
|
||
- Common LXC issues and solutions
|
||
- Docker in LXC quirks
|
||
- Performance tuning tips
|
||
- Software transcoding optimization (Plex/Tdarr)
|
||
|
||
### Key Differences: VM vs LXC Operations
|
||
|
||
| Operation | VM Command | LXC Command |
|
||
|-----------|-----------|-------------|
|
||
| List | `qm list` | `pct list` |
|
||
| Start | `qm start 106` | `pct start 206` |
|
||
| Stop | `qm stop 106` | `pct stop 206` |
|
||
| Enter console | `qm terminal 106` | `pct enter 206` |
|
||
| Create | `qm create ...` | `pct create ...` |
|
||
| Backup | `vzdump 106` | `vzdump 206` |
|
||
| Restore | `qm restore ...` | `pct restore ...` |
|
||
| Config | `/etc/pve/qemu-server/106.conf` | `/etc/pve/lxc/206.conf` |
|
||
|
||
---
|
||
|
||
## Phase 9: Timeline & Milestones
|
||
|
||
### Proposed Timeline (4-6 Weeks - Likely to Accelerate)
|
||
|
||
**Week 1: Wave 1 - Lowest Risk**
|
||
- Day 1-2: Build and migrate docker-7days (111)
|
||
- Day 3-7: Monitor and validate - if stable, proceed immediately
|
||
|
||
**Week 1-2: Wave 2 - Regional/Isolated Docker Hosts**
|
||
- Day 5-6: Migrate docker-pittsburgh (114)
|
||
- Day 7-8: Migrate docker-vpn (105)
|
||
- Day 9-14: Monitor both services
|
||
|
||
**Week 2-3: Wave 3 - Additional Docker Hosts**
|
||
- Day 10-11: Migrate docker-sba (115)
|
||
- Day 12-13: Migrate docker-unused (117) or decommission
|
||
- Day 14-15: Migrate docker-home-servers (116)
|
||
- Day 16-21: Monitor all Wave 3 services
|
||
|
||
**Week 3-4: Wave 4 - Application & Database Servers**
|
||
- Day 17-18: Migrate discord-bots (110)
|
||
- Day 19-20: Migrate databases-bots (112) - EXTRA CARE
|
||
- Day 21-28: Extended monitoring for database migration
|
||
|
||
**Week 4-5: Wave 5 - Media Services**
|
||
- Day 22-23: Migrate docker-tdarr (113)
|
||
- Day 24-25: Migrate plex (107)
|
||
- Day 26-35: Monitor transcoding performance and CPU usage
|
||
|
||
**Week 5-6: Wave 6 - Final Critical Infrastructure**
|
||
- Day 29-30: Migrate docker-home (106) - Most critical
|
||
- Day 31-42: Extended monitoring and final optimization
|
||
|
||
**Post-Migration: Cleanup & Optimization**
|
||
- Resource optimization (right-sizing containers)
|
||
- Documentation finalization
|
||
- Final VM decommissioning after confidence period
|
||
|
||
**Note**: Timeline likely to accelerate based on success and comfort level. Waves may overlap if previous waves are stable ahead of schedule.
|
||
|
||
### Decision Gates
|
||
|
||
**Gate 1 (After Wave 1)**: docker-7days Success
|
||
- ✅ Game server stable and playable → Proceed to Wave 2
|
||
- ❌ Issues encountered → Pause, troubleshoot, refine process
|
||
|
||
**Gate 2 (After Wave 2)**: Regional Docker Hosts Success
|
||
- ✅ VPN routing working, regional services stable → Proceed to Wave 3
|
||
- ❌ Critical issues → Pause and reassess approach
|
||
|
||
**Gate 3 (After Wave 3)**: Docker Infrastructure Success
|
||
- ✅ All Docker hosts stable → Proceed to Wave 4
|
||
- ❌ Issues → Pause, may need to adjust LXC configuration
|
||
|
||
**Gate 4 (After Wave 4)**: Database Migration Success
|
||
- ✅ Database performance acceptable, no data issues → Proceed to Wave 5
|
||
- ❌ Database performance issues → Investigate before proceeding
|
||
|
||
**Gate 5 (After Wave 5)**: Media Services Success
|
||
- ✅ Software transcoding performance acceptable → Proceed to Wave 6
|
||
- ❌ Transcoding too CPU-intensive → May need resource adjustment or keep as VMs
|
||
|
||
**Gate 6 (After Wave 6)**: Final Critical Service Success
|
||
- ✅ docker-home stable → Begin cleanup and decommissioning
|
||
- ❌ Issues → Rollback and reassess
|
||
|
||
---
|
||
|
||
## Phase 10: Post-Migration Operations
|
||
|
||
### Ongoing Management
|
||
|
||
**Monthly Tasks**:
|
||
- Review resource utilization and right-size containers
|
||
- Validate backup/restore procedures
|
||
- Check for LXC template updates
|
||
- Review and update documentation
|
||
|
||
**Quarterly Tasks**:
|
||
- Evaluate new services for LXC vs VM placement
|
||
- Performance benchmarking
|
||
- Disaster recovery drill
|
||
- Capacity planning review
|
||
|
||
### Continuous Improvement
|
||
|
||
**Optimization Opportunities**:
|
||
- Standardize LXC templates with common tooling
|
||
- Automate container provisioning (Terraform/Ansible)
|
||
- Implement infrastructure-as-code for configs
|
||
- Build CI/CD for container updates
|
||
|
||
**Future Considerations**:
|
||
- Evaluate Proxmox clustering for HA
|
||
- Consider container orchestration (Kubernetes) if container count grows
|
||
- Explore automated resource balancing
|
||
|
||
---
|
||
|
||
## Appendix A: LXC Container ID Mapping
|
||
|
||
**Proposed New Container IDs** (200-series for LXC):
|
||
|
||
| Wave | VM ID | VM Name | New LXC ID | LXC Name | Migration Priority |
|
||
|------|-------|---------|-----------|----------|-------------------|
|
||
| 1 | 111 | docker-7days | 211 | docker-7days-lxc | FIRST - Lowest risk validation |
|
||
| 2 | 114 | docker-pittsburgh | 214 | docker-pittsburgh-lxc | Regional/isolated |
|
||
| 2 | 121 | docker-vpn | 221 | arr-stack | ✅ COMPLETE - VPN eliminated, simplified to arr stack |
|
||
| 3 | 115 | docker-sba | 215 | docker-sba-lxc | Additional Docker hosts |
|
||
| 3 | 117 | docker-unused | 217 | docker-unused-lxc | Migrate or decommission |
|
||
| 3 | 116 | docker-home-servers | 216 | docker-home-servers-lxc | Additional Docker hosts |
|
||
| 4 | 110 | discord-bots | 210 | discord-bots-lxc | Application servers |
|
||
| 4 | 112 | databases-bots | 212 | databases-bots-lxc | Database (EXTRA CARE) |
|
||
| ~~5~~ | ~~113~~ | ~~docker-tdarr~~ | ~~213~~ | ~~docker-tdarr-lxc~~ | ❌ RETIRED - moved to GPU server |
|
||
| ~~5~~ | ~~107~~ | ~~plex~~ | ~~207~~ | ~~plex-lxc~~ | ❌ DECOMMISSIONING - replaced by Jellyfin |
|
||
| 6 | 106 | docker-home | 206 | docker-home-lxc | FINAL - Most critical |
|
||
|
||
**Keep as VM**:
|
||
- 109 (hass-io) - HassOS requirement
|
||
- 100 (ubuntu-template) - Strategic VM template
|
||
- 103 (docker-template) - Convert to LXC template eventually
|
||
|
||
---
|
||
|
||
## Appendix B: Quick Reference Commands
|
||
|
||
### Create Standard Docker LXC
|
||
```bash
|
||
pct create 2XX local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||
--hostname NAME \
|
||
--memory 4096 \
|
||
--cores 2 \
|
||
--net0 name=eth0,bridge=vmbr0,ip=10.10.0.XX/24,gw=10.10.0.1 \
|
||
--storage local-lvm \
|
||
--rootfs local-lvm:32 \
|
||
--unprivileged 0 \
|
||
--features nesting=1,keyctl=1
|
||
```
|
||
|
||
### Data Sync During Migration
|
||
```bash
|
||
# Initial sync (while VM running)
|
||
rsync -avz --progress root@VM_IP:/data/ root@LXC_IP:/data/
|
||
|
||
# Final sync (VM stopped)
|
||
rsync -avz --progress --delete root@VM_IP:/data/ root@LXC_IP:/data/
|
||
```
|
||
|
||
### Quick Validation
|
||
```bash
|
||
# Check LXC is running
|
||
pct status 2XX
|
||
|
||
# Check services inside
|
||
pct enter 2XX
|
||
systemctl status docker
|
||
docker ps
|
||
exit
|
||
|
||
# Network connectivity
|
||
ping -c 3 10.10.0.2XX
|
||
curl -f http://10.10.0.2XX
|
||
```
|
||
|
||
---
|
||
|
||
## Appendix C: Contact & Escalation
|
||
|
||
**Migration Owner**: Cal Corum (cal.corum@gmail.com)
|
||
|
||
**Key Resources**:
|
||
- Proxmox skill: `~/.claude/skills/proxmox/`
|
||
- VM management docs: `/mnt/NV2/Development/claude-home/vm-management/`
|
||
- Proxmox API: `~/.claude/skills/proxmox/proxmox_client.py`
|
||
|
||
**Support Channels**:
|
||
- Proxmox forums: https://forum.proxmox.com/
|
||
- LXC documentation: https://linuxcontainers.org/
|
||
- Docker in LXC: https://forum.proxmox.com/threads/docker-in-lxc.38129/
|
||
|
||
---
|
||
|
||
**Next Steps**:
|
||
1. ✅ Migration plan approved with confirmed decisions
|
||
2. Schedule Wave 1 migration window for docker-7days (111)
|
||
3. Build first LXC container for docker-7days
|
||
4. Execute Wave 1 migration and validate process
|
||
|
||
**Document Version**: 2.0 (Approved)
|
||
**Last Updated**: 2025-01-12
|
||
**Status**: Approved & Ready for Execution
|