claude-home/vm-management/lxc-migration-plan.md
Cal Corum 11b96bce2c CLAUDE: Add LXC migration guides and scripts
- Add LXC migration plan and quick-start guide
- Add wave 1 and wave 2 migration results
- Add lxc-docker-create.sh for container creation
- Add fix-docker-apparmor.sh for AppArmor issues
- Add comprehensive LXC migration guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 00:48:30 -06:00

822 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# VM to LXC Migration Plan - Proxmox Infrastructure
**Created**: 2025-01-12
**Status**: ✅ Wave 2 Complete - In Progress
**Owner**: Cal Corum
**Last Updated**: 2025-12-05
## 🎯 Wave 1 Status: ✅ **COMPLETE**
- **VM 111 (docker-7days)** → **LXC 211** ✅ Successful
- **Migration Date**: 2025-01-12
- **Container Status**: Running and validated
- **Detailed Results**: See `wave1-migration-results.md`
## 🎯 Wave 2 Status: ✅ **COMPLETE**
- **VM 121 (docker-vpn)** → **LXC 221 (arr-stack)** ✅ Successful
- **Migration Date**: 2025-12-05
- **Container Status**: Running and validated
- **Key Changes**:
- Eliminated Mullvad VPN (Usenet + SSL is sufficient, no torrents)
- Replaced Overseerr with Jellyseerr (native Jellyfin support)
- Simplified stack: Sonarr, Radarr, Readarr, Jellyseerr, SABnzbd
- **Detailed Results**: See `wave2-migration-results.md`
## ✅ Confirmed Decisions
- **Networking**: Reuse existing IP addresses (transparent migration)
- **Storage**: Fresh install + volume copy for all Docker hosts
- **Timeline**: 4-6 weeks (updated from initial 6-8 based on Wave 1 experience)
- **GPU Services**: No GPU hardware available - Plex (107) and Tdarr (113) can migrate without special considerations
- **AppArmor Fix**: ALL docker-compose files need `security_opt: [apparmor=unconfined]` ⚠️ CRITICAL
## Executive Summary
Migrating services from full VMs to LXC containers on Proxmox to:
- Reduce resource overhead (memory, CPU, storage)
- Improve density and efficiency
- Faster provisioning and backup/restore
- Lower management complexity
**Current State**: 16 VMs (9 running, 7 stopped)
**Target State**: Strategic mix of LXC containers and VMs based on workload requirements
---
## Phase 1: Assessment & Categorization
### Current VM Inventory Analysis
#### Running Production VMs (9)
| VMID | Name | Service Type | Migration Candidate? | Priority | Notes |
|------|------|--------------|---------------------|----------|-------|
| 105 | docker-vpn | Docker Host | ✅ YES | HIGH | VPN routing considerations |
| 106 | docker-home | Docker Host | ✅ YES | HIGH | Critical home services |
| 107 | plex | Media Server | ✅ YES | MEDIUM | Software transcoding (no GPU hardware) |
| 109 | hass-io | Home Assistant | ❌ NO | N/A | HassOS requires VM, not standard Linux |
| 110 | discord-bots | Application | ✅ YES | MEDIUM | Simple Python services |
| 111 | docker-7days | Game Server | ✅ YES | HIGHEST | Lowest risk - migrate first |
| 112 | databases-bots | Database | ✅ YES | HIGH | PostgreSQL/databases |
| 113 | docker-tdarr | Transcode | ✅ YES | MEDIUM | Software transcoding (no GPU hardware) |
| 114 | docker-pittsburgh | Docker Host | ✅ YES | MEDIUM | Regional services |
| 115 | docker-sba | Docker Host | ✅ YES | MEDIUM | SBA baseball services |
| 116 | docker-home-servers | Docker Host | ✅ YES | HIGH | Critical infrastructure |
#### Stopped/Template VMs (7)
| VMID | Name | Purpose | Action |
|------|------|---------|--------|
| 100 | ubuntu-template | Template | KEEP as VM for flexibility |
| 101 | 7d-solo | Game Server | EVALUATE when needed |
| 102 | 7d-staci | Game Server | EVALUATE when needed |
| 103 | docker-template | Template | CONVERT to LXC template |
| 104 | 7d-wotw | Game Server | EVALUATE when needed |
| 117 | docker-unused | Unused | DELETE or ARCHIVE |
### Migration Suitability Matrix
#### ✅ **IDEAL for LXC** (All Migrate)
- **Game server - docker-7days (111)**: LOWEST RISK - Migrate first to validate process
- **Docker hosts** (105, 106, 114, 115, 116): Standard Docker workloads without special hardware
- **Application servers** (110): Discord bots, Python services
- **Database servers** (112): PostgreSQL, Redis, standard databases
- **Media servers** (107, 113): Plex and Tdarr using software transcoding (no GPU available)
- **Stopped game servers** (101, 102, 104): Migrate when needed
- **Docker template** (103): Convert to LXC template for faster provisioning
**Why**: No GPU hardware in system - all services can run in LXC without special considerations. Pure Linux workloads benefit from reduced overhead.
#### ❌ **KEEP as VM** (Do Not Migrate)
- **Home Assistant (109)**: HassOS is VM-optimized, not standard Linux
- **Ubuntu template (100)**: Keep VM flexibility for future VM deployments
**Why**: Technical incompatibility or strategic value as VM
---
## Phase 2: Technical Planning
### Service Consolidation Decision Framework
When deciding whether to keep services in separate LXCs or consolidate into a single LXC:
#### **Keep Separate** (1 LXC per service) when:
| Factor | Reason |
|--------|--------|
| **Blast radius** | Failure of one shouldn't take down others |
| **Different update cycles** | Services need independent maintenance windows |
| **Resource contention** | CPU/memory-hungry services that compete |
| **Security boundaries** | Different trust levels or network access needs |
| **Different owners/teams** | Separate accountability |
| **Databases** | Always isolate for backup/restore simplicity |
| **Critical infrastructure** | VPN, DNS, reverse proxy - high availability needs |
#### **Consolidate** (multiple services in 1 LXC) when:
| Factor | Reason |
|--------|--------|
| **Related services** | Naturally belong together (e.g., all SBA services) |
| **Low resource usage** | Services that barely use resources individually |
| **Same lifecycle** | Updated/restarted together anyway |
| **Shared dependencies** | Same database, same configs |
| **Simplicity wins** | Fewer LXCs to manage, backup, monitor |
| **Same project** | Discord bots for same league, microservices for same app |
#### Practical Examples:
| Keep Separate | Why |
|---------------|-----|
| Databases (112) | Backup/restore, data integrity |
| VPN (105) | Security boundary, networking critical |
| Critical home services (106) | High availability |
| n8n (210) | Workflow automation, independent maintenance |
| Candidate for Consolidation | Why |
|-----------------------------|-----|
| Discord bots + related API services | Same project, low resources, same maintainer |
| Multiple low-traffic web apps | Minimal resource usage |
| Dev/test environments | Non-critical, shared lifecycle |
---
### LXC vs VM Decision Criteria
| Criteria | LXC Container | Full VM | Notes |
|----------|--------------|---------|-------|
| **OS Type** | Linux only | Any OS | LXC shares host kernel |
| **Resource Overhead** | Minimal (~50-200MB RAM) | High (full OS stack) | LXC 5-10x more efficient |
| **Boot Time** | 1-5 seconds | 30-90 seconds | Near-instant container start |
| **Kernel Modules** | Shared host kernel | Own kernel | LXC cannot load custom modules |
| **Hardware Passthrough** | Limited (requires privileges) | Full passthrough | GPU/USB may need testing |
| **Nested Virtualization** | Not supported | Supported | Cannot run Docker-in-Docker easily |
| **Backup/Restore** | Very fast | Slower | Container backups are incremental |
| **Disk Performance** | Native | Near-native | Both excellent on modern storage |
### Key Technical Decisions
#### 1. **Networking Strategy** ✅ CONFIRMED
**Decision**: Reuse existing IP addresses
**Implementation**:
- ✅ No DNS changes required
- ✅ Existing firewall rules work
- ✅ Monitoring continues without changes
- ✅ Transparent migration for users
- ⚠️ Requires careful IP conflict management during parallel running
**Migration Process**:
1. Build LXC with temporary IP (or offline)
2. Test and validate LXC functionality
3. Stop VM during maintenance window
4. Reconfigure LXC to production IP
5. Start LXC and validate
6. Keep VM stopped for 48hr rollback window
#### 2. **Storage Strategy** ✅ CONFIRMED
**Decision**: Fresh install + volume copy for all Docker hosts
**Implementation for Docker Hosts**:
1. **Fresh LXC installation**:
- Clean Ubuntu 22.04 LTS base
- Install Docker via standard script
- Install docker-compose plugin
- No migration of system configs
2. **Volume migration**:
- Copy `/var/lib/docker/volumes/` from VM to LXC
- Copy docker-compose files from VM to LXC
- Copy environment files (.env) if applicable
- Validate volume data integrity
**Benefits**:
- ✅ Clean configuration, no cruft
- ✅ Opportunity to update/standardize configs
- ✅ Smaller container images
- ✅ Document infrastructure-as-code
- ✅ Latest Docker version on fresh install
#### 3. **Docker in LXC** ✅ CONFIRMED
**Decision**: Privileged LXC containers for all Docker hosts
**Configuration**:
- Set `--unprivileged 0` (privileged mode)
- Enable nesting: `--features nesting=1,keyctl=1`
- Docker works without issues
- All Docker features supported
- No complex UID mapping required
**Rationale**:
- ✅ Docker compatibility guaranteed
- ✅ Simpler configuration and troubleshooting
- ✅ Balanced approach for home lab environment
- ⚠️ Acceptable security trade-off for isolated home network
---
## Phase 3: Migration Strategy
### Phased Rollout Approach (Risk-Based Ordering)
#### **Wave 1: Lowest Risk - Game Server** (Week 1)
**Target**: Lowest-risk service to validate entire migration process
1. **docker-7days (111)** - Game server via Docker, lowest impact if issues occur
**Why This First**:
- ✅ Non-critical service (gaming only)
- ✅ Can migrate during off-hours when not in use
- ✅ Clear validation criteria (game server starts and runs)
- ✅ Builds confidence in process with minimal risk
- ✅ Tests Docker-in-LXC configuration end-to-end
**Success Criteria**:
- Game server accessible and playable
- Docker containers running stable for 48+ hours
- Backup/restore tested successfully
- Rollback procedure validated
- Process documented for next waves
#### **Wave 2: Docker Hosts - Regional/Isolated** (Week 1-2)
**Target**: Docker hosts with lower criticality and good isolation
2. **docker-pittsburgh (114)** - Regional services, lower criticality
3. **docker-vpn (105)** - VPN routing (isolated workload)
**Prerequisites**:
- Wave 1 successful (docker-7days stable)
- Process refined based on learnings
- Confidence in Docker-in-LXC configuration
**Validation Points**:
- VPN routing works correctly (105)
- Regional services accessible (114)
- No cross-service impact
#### **Wave 3: Additional Docker Hosts** (Week 2-3)
**Target**: More Docker infrastructure, increasing criticality
4. **docker-sba (115)** - Baseball services (defined maintenance windows)
5. **docker-unused (117)** - Migrate or decommission
6. **docker-home-servers (116)** - Home server infrastructure
**Critical Considerations**:
- SBA has known maintenance windows - use those
- docker-home-servers may have dependencies - validate carefully
- docker-unused can be decommissioned if no longer needed
#### **Wave 4: Application & Database Servers** (Week 3-4)
**Target**: Non-Docker services requiring extra care
7. **discord-bots (110)** - Python services, straightforward
8. **databases-bots (112)** - PostgreSQL/databases (highest care required)
**Critical Steps for Databases**:
- ⚠️ Full database backup before migration
- ⚠️ Validate connection strings from all dependent services
- ⚠️ Test database performance in LXC thoroughly
- ⚠️ Monitor for 48+ hours before decommissioning VM
- ⚠️ Have rollback plan ready and tested
#### **Wave 5: Media Services** ~~(Week 4-5)~~ **SKIPPED**
**Status**: ❌ SKIPPED - Services retired or decommissioned
~~9. **docker-tdarr (113)**~~ - **RETIRED**: Tdarr moved to dedicated GPU server (ubuntu-manticore)
~~10. **plex (107)**~~ - **DECOMMISSIONING**: Plex being retired, no migration needed
**Notes**:
- Tdarr now runs on ubuntu-manticore (10.10.0.226) with GPU transcoding
- Plex scheduled for decommission - Jellyfin is the replacement
#### **Wave 6: Final Critical Infrastructure** (Week 5-6)
**Target**: Most critical Docker infrastructure (save for last)
11. **docker-home (106)** - Critical home services (highest risk)
**Why Last**:
- Most critical infrastructure
- All other waves provide confidence
- Process fully refined and validated
- All potential issues already encountered and resolved
**Do NOT Migrate**:
- **hass-io (109)** - Keep as VM (HassOS requirement)
- **ubuntu-template (100)** - Keep as VM (strategic flexibility)
### Parallel Running Strategy
**For Each Migration**:
1. **Build LXC container** (new ID, temporary IP or offline)
2. **Configure and test** (validate all functionality)
3. **Sync data** from VM to LXC (while VM still running)
4. **Maintenance window**:
- Stop VM
- Final data sync
- Change LXC to production IP
- Start LXC
- Validate services
5. **Monitor for 24-48 hours** (VM kept in stopped state)
6. **Decommission VM** after confidence period
**Rollback Procedure**:
- Stop LXC
- Start VM (already has data up to cutover point)
- Resume production on VM
- Document what failed for retry
---
## Phase 4: Implementation Checklist
### Pre-Migration (Per Service)
- [ ] Document current VM configuration
- [ ] CPU, memory, storage allocation
- [ ] Network configuration (IP, gateway, DNS)
- [ ] Installed packages and services
- [ ] Docker compose files (if Docker host)
- [ ] Volume mounts and storage locations
- [ ] Environment variables and secrets
- [ ] Cron jobs and systemd services
- [ ] Create LXC container
- [ ] Select appropriate template (Ubuntu 22.04 LTS recommended)
- [ ] Allocate resources (start conservative, can increase)
- [ ] Configure networking (temporary IP for testing)
- [ ] Set privileged mode if Docker host
- [ ] Configure storage (bind mounts for data volumes)
- [ ] Prepare migration scripts
- [ ] Data sync script (rsync-based)
- [ ] Configuration export/import
- [ ] Service validation tests
- [ ] Backup current VM
- [ ] Full VM backup in Proxmox
- [ ] Export critical data separately
- [ ] Document backup location and restore procedure
### During Migration
- [ ] Announce maintenance window (if user-facing)
- [ ] Stop services on VM (or entire VM)
- [ ] Perform final data sync to LXC
- [ ] Update DNS/networking (if using new IP temporarily)
- [ ] Start services in LXC
- [ ] Run validation tests
- [ ] Service responding?
- [ ] Data accessible?
- [ ] External connectivity working?
- [ ] Dependent services connecting successfully?
- [ ] Performance acceptable?
### Post-Migration
- [ ] Monitor for 24 hours
- [ ] Check logs for errors
- [ ] Monitor resource usage
- [ ] Validate backups working
- [ ] Test restore procedure
- [ ] Update documentation
- [ ] Update VM inventory
- [ ] Document new container configuration
- [ ] Update monitoring configs
- [ ] Update runbooks/procedures
- [ ] After 48-hour success period
- [ ] Backup LXC container
- [ ] Delete VM backup (or archive)
- [ ] Destroy original VM
- [ ] Update network documentation
---
## Phase 5: Technical Implementation Details
### Standard LXC Container Creation
```bash
# Create privileged LXC container for Docker host
pct create 205 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
--hostname docker-home-lxc \
--memory 4096 \
--cores 2 \
--net0 name=eth0,bridge=vmbr0,ip=10.10.0.106/24,gw=10.10.0.1 \
--storage local-lvm \
--rootfs local-lvm:32 \
--unprivileged 0 \
--features nesting=1,keyctl=1
# Start container
pct start 205
# Enter container
pct enter 205
```
### Docker Installation in LXC
```bash
# Inside LXC container
# Update system
apt update && apt upgrade -y
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh
# Install Docker Compose
apt install docker-compose-plugin -y
# Verify
docker --version
docker compose version
```
### Data Migration Script Template
```bash
#!/bin/bash
# migrate-docker-host.sh
VM_IP="10.10.0.106"
LXC_IP="10.10.0.206" # Temporary during migration
VM_DATA="/var/lib/docker"
LXC_DATA="/var/lib/docker"
# Sync Docker volumes (while VM still running for initial sync)
rsync -avz --progress \
root@${VM_IP}:${VM_DATA}/ \
root@${LXC_IP}:${LXC_DATA}/
# Sync docker-compose files
rsync -avz --progress \
root@${VM_IP}:/opt/docker/ \
root@${LXC_IP}:/opt/docker/
# Sync environment files
rsync -avz --progress \
root@${VM_IP}:/root/.env \
root@${LXC_IP}:/root/.env
echo "Initial sync complete. Ready for cutover."
```
### Service Validation Script
```bash
#!/bin/bash
# validate-migration.sh
CONTAINER_IP="$1"
SERVICE_TYPE="$2"
echo "Validating migration for ${SERVICE_TYPE} at ${CONTAINER_IP}..."
case $SERVICE_TYPE in
docker)
# Check Docker is running
ssh root@${CONTAINER_IP} "docker ps" || exit 1
# Check compose services
ssh root@${CONTAINER_IP} "cd /opt/docker && docker compose ps" || exit 1
echo "✅ Docker services validated"
;;
database)
# Check PostgreSQL
ssh root@${CONTAINER_IP} "systemctl status postgresql" || exit 1
# Test connection
ssh root@${CONTAINER_IP} "sudo -u postgres psql -c 'SELECT version();'" || exit 1
echo "✅ Database validated"
;;
web)
# Check HTTP response
curl -f http://${CONTAINER_IP} || exit 1
echo "✅ Web service validated"
;;
esac
echo "✅ All validation checks passed!"
```
---
## Phase 6: Risk Management
### Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|------|-----------|--------|------------|
| Service downtime during migration | HIGH | MEDIUM | Off-hours migration, parallel running, fast rollback |
| Data loss during sync | LOW | HIGH | Multiple backups, checksums, validation |
| GPU passthrough failure | MEDIUM | MEDIUM | Test first, keep VMs as fallback |
| Performance degradation | LOW | MEDIUM | Monitor closely, can revert easily |
| Networking issues | MEDIUM | HIGH | Keep VM stopped but intact for rollback |
| Forgotten dependencies | MEDIUM | HIGH | Document thoroughly, test before cutover |
### Rollback Procedures
#### Quick Rollback (During Cutover)
```bash
# If migration fails during cutover window
pct stop 205 # Stop new LXC
qm start 106 # Start original VM
# Service restored in <2 minutes
```
#### Rollback After Migration
```bash
# If issues discovered post-migration
pct stop 205 # Stop LXC
qm start 106 # Start original VM
qm restore 106 backup-file.vma.zst # If needed
# May need to sync recent data from LXC to VM
```
### Success Metrics
**Per-Service Success Criteria**:
- Service uptime: 99.9% after 48 hours
- Response time: Same or better than VM
- Resource usage: 30-50% reduction in RAM usage
- No errors in logs
- Backups completing successfully
- Dependent services connecting properly
**Overall Migration Success**:
- 80%+ of suitable VMs migrated to LXC
- Zero data loss incidents
- Total downtime <4 hours across all migrations
- Documentation complete and validated
- Team confident in managing LXC infrastructure
---
## Phase 7: Resource Planning
### Expected Resource Gains
**Current VM Resource Usage** (estimated):
- 9 running VMs × 2GB average overhead = ~18GB RAM overhead
- 9 running VMs × 500MB average storage overhead = ~4.5GB storage
**Post-Migration LXC Resource Usage** (estimated):
- 7-8 LXC containers × 100MB average overhead = ~800MB RAM overhead
- 7-8 LXC containers × 100MB average storage overhead = ~800MB storage
**Net Gain**:
- ~17GB RAM freed (can support 17 more LXC containers or larger workloads)
- ~3.7GB storage freed
- Faster backup/restore times (5-10x improvement)
- Faster provisioning (minutes vs hours)
### Resource Allocation Strategy
**Conservative Approach** (Recommended for initial migration):
- Allocate **same resources as VM** to LXC initially
- Monitor usage for 1-2 weeks
- Right-size after baseline established
- Iterate and optimize
**Example**: VM with 4GB RAM, 2 cores
- LXC Initial: 4GB RAM, 2 cores
- After monitoring: Adjust to 2GB RAM, 2 cores (if appropriate)
- Freed resources: 2GB RAM for other uses
---
## Phase 8: Documentation & Knowledge Transfer
### Required Documentation Updates
- [ ] **VM Inventory** **LXC Inventory**
- Update VMID mappings
- Update IP addresses (if changed)
- Update resource allocations
- [ ] **Runbooks**
- Update operational procedures for LXC
- Document `pct` commands vs `qm` commands
- Update backup/restore procedures
- [ ] **Monitoring**
- Update monitoring configs for LXC IDs
- Verify alerts still firing correctly
- Update dashboards
- [ ] **Troubleshooting Guide**
- Common LXC issues and solutions
- Docker in LXC quirks
- Performance tuning tips
- Software transcoding optimization (Plex/Tdarr)
### Key Differences: VM vs LXC Operations
| Operation | VM Command | LXC Command |
|-----------|-----------|-------------|
| List | `qm list` | `pct list` |
| Start | `qm start 106` | `pct start 206` |
| Stop | `qm stop 106` | `pct stop 206` |
| Enter console | `qm terminal 106` | `pct enter 206` |
| Create | `qm create ...` | `pct create ...` |
| Backup | `vzdump 106` | `vzdump 206` |
| Restore | `qm restore ...` | `pct restore ...` |
| Config | `/etc/pve/qemu-server/106.conf` | `/etc/pve/lxc/206.conf` |
---
## Phase 9: Timeline & Milestones
### Proposed Timeline (4-6 Weeks - Likely to Accelerate)
**Week 1: Wave 1 - Lowest Risk**
- Day 1-2: Build and migrate docker-7days (111)
- Day 3-7: Monitor and validate - if stable, proceed immediately
**Week 1-2: Wave 2 - Regional/Isolated Docker Hosts**
- Day 5-6: Migrate docker-pittsburgh (114)
- Day 7-8: Migrate docker-vpn (105)
- Day 9-14: Monitor both services
**Week 2-3: Wave 3 - Additional Docker Hosts**
- Day 10-11: Migrate docker-sba (115)
- Day 12-13: Migrate docker-unused (117) or decommission
- Day 14-15: Migrate docker-home-servers (116)
- Day 16-21: Monitor all Wave 3 services
**Week 3-4: Wave 4 - Application & Database Servers**
- Day 17-18: Migrate discord-bots (110)
- Day 19-20: Migrate databases-bots (112) - EXTRA CARE
- Day 21-28: Extended monitoring for database migration
**Week 4-5: Wave 5 - Media Services**
- Day 22-23: Migrate docker-tdarr (113)
- Day 24-25: Migrate plex (107)
- Day 26-35: Monitor transcoding performance and CPU usage
**Week 5-6: Wave 6 - Final Critical Infrastructure**
- Day 29-30: Migrate docker-home (106) - Most critical
- Day 31-42: Extended monitoring and final optimization
**Post-Migration: Cleanup & Optimization**
- Resource optimization (right-sizing containers)
- Documentation finalization
- Final VM decommissioning after confidence period
**Note**: Timeline likely to accelerate based on success and comfort level. Waves may overlap if previous waves are stable ahead of schedule.
### Decision Gates
**Gate 1 (After Wave 1)**: docker-7days Success
- Game server stable and playable Proceed to Wave 2
- Issues encountered Pause, troubleshoot, refine process
**Gate 2 (After Wave 2)**: Regional Docker Hosts Success
- VPN routing working, regional services stable Proceed to Wave 3
- Critical issues Pause and reassess approach
**Gate 3 (After Wave 3)**: Docker Infrastructure Success
- All Docker hosts stable Proceed to Wave 4
- Issues Pause, may need to adjust LXC configuration
**Gate 4 (After Wave 4)**: Database Migration Success
- Database performance acceptable, no data issues Proceed to Wave 5
- Database performance issues Investigate before proceeding
**Gate 5 (After Wave 5)**: Media Services Success
- Software transcoding performance acceptable Proceed to Wave 6
- Transcoding too CPU-intensive May need resource adjustment or keep as VMs
**Gate 6 (After Wave 6)**: Final Critical Service Success
- docker-home stable Begin cleanup and decommissioning
- Issues Rollback and reassess
---
## Phase 10: Post-Migration Operations
### Ongoing Management
**Monthly Tasks**:
- Review resource utilization and right-size containers
- Validate backup/restore procedures
- Check for LXC template updates
- Review and update documentation
**Quarterly Tasks**:
- Evaluate new services for LXC vs VM placement
- Performance benchmarking
- Disaster recovery drill
- Capacity planning review
### Continuous Improvement
**Optimization Opportunities**:
- Standardize LXC templates with common tooling
- Automate container provisioning (Terraform/Ansible)
- Implement infrastructure-as-code for configs
- Build CI/CD for container updates
**Future Considerations**:
- Evaluate Proxmox clustering for HA
- Consider container orchestration (Kubernetes) if container count grows
- Explore automated resource balancing
---
## Appendix A: LXC Container ID Mapping
**Proposed New Container IDs** (200-series for LXC):
| Wave | VM ID | VM Name | New LXC ID | LXC Name | Migration Priority |
|------|-------|---------|-----------|----------|-------------------|
| 1 | 111 | docker-7days | 211 | docker-7days-lxc | FIRST - Lowest risk validation |
| 2 | 114 | docker-pittsburgh | 214 | docker-pittsburgh-lxc | Regional/isolated |
| 2 | 121 | docker-vpn | 221 | arr-stack | COMPLETE - VPN eliminated, simplified to arr stack |
| 3 | 115 | docker-sba | 215 | docker-sba-lxc | Additional Docker hosts |
| 3 | 117 | docker-unused | 217 | docker-unused-lxc | Migrate or decommission |
| 3 | 116 | docker-home-servers | 216 | docker-home-servers-lxc | Additional Docker hosts |
| 4 | 110 | discord-bots | 210 | discord-bots-lxc | Application servers |
| 4 | 112 | databases-bots | 212 | databases-bots-lxc | Database (EXTRA CARE) |
| ~~5~~ | ~~113~~ | ~~docker-tdarr~~ | ~~213~~ | ~~docker-tdarr-lxc~~ | RETIRED - moved to GPU server |
| ~~5~~ | ~~107~~ | ~~plex~~ | ~~207~~ | ~~plex-lxc~~ | DECOMMISSIONING - replaced by Jellyfin |
| 6 | 106 | docker-home | 206 | docker-home-lxc | FINAL - Most critical |
**Keep as VM**:
- 109 (hass-io) - HassOS requirement
- 100 (ubuntu-template) - Strategic VM template
- 103 (docker-template) - Convert to LXC template eventually
---
## Appendix B: Quick Reference Commands
### Create Standard Docker LXC
```bash
pct create 2XX local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
--hostname NAME \
--memory 4096 \
--cores 2 \
--net0 name=eth0,bridge=vmbr0,ip=10.10.0.XX/24,gw=10.10.0.1 \
--storage local-lvm \
--rootfs local-lvm:32 \
--unprivileged 0 \
--features nesting=1,keyctl=1
```
### Data Sync During Migration
```bash
# Initial sync (while VM running)
rsync -avz --progress root@VM_IP:/data/ root@LXC_IP:/data/
# Final sync (VM stopped)
rsync -avz --progress --delete root@VM_IP:/data/ root@LXC_IP:/data/
```
### Quick Validation
```bash
# Check LXC is running
pct status 2XX
# Check services inside
pct enter 2XX
systemctl status docker
docker ps
exit
# Network connectivity
ping -c 3 10.10.0.2XX
curl -f http://10.10.0.2XX
```
---
## Appendix C: Contact & Escalation
**Migration Owner**: Cal Corum (cal.corum@gmail.com)
**Key Resources**:
- Proxmox skill: `~/.claude/skills/proxmox/`
- VM management docs: `/mnt/NV2/Development/claude-home/vm-management/`
- Proxmox API: `~/.claude/skills/proxmox/proxmox_client.py`
**Support Channels**:
- Proxmox forums: https://forum.proxmox.com/
- LXC documentation: https://linuxcontainers.org/
- Docker in LXC: https://forum.proxmox.com/threads/docker-in-lxc.38129/
---
**Next Steps**:
1. Migration plan approved with confirmed decisions
2. Schedule Wave 1 migration window for docker-7days (111)
3. Build first LXC container for docker-7days
4. Execute Wave 1 migration and validate process
**Document Version**: 2.0 (Approved)
**Last Updated**: 2025-01-12
**Status**: Approved & Ready for Execution