CLAUDE: Add LXC migration guides and scripts
- Add LXC migration plan and quick-start guide - Add wave 1 and wave 2 migration results - Add lxc-docker-create.sh for container creation - Add fix-docker-apparmor.sh for AppArmor issues - Add comprehensive LXC migration guide 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
66d2a4bda7
commit
11b96bce2c
821
vm-management/lxc-migration-plan.md
Normal file
821
vm-management/lxc-migration-plan.md
Normal file
@ -0,0 +1,821 @@
|
|||||||
|
# VM to LXC Migration Plan - Proxmox Infrastructure
|
||||||
|
|
||||||
|
**Created**: 2025-01-12
|
||||||
|
**Status**: ✅ Wave 2 Complete - In Progress
|
||||||
|
**Owner**: Cal Corum
|
||||||
|
**Last Updated**: 2025-12-05
|
||||||
|
|
||||||
|
## 🎯 Wave 1 Status: ✅ **COMPLETE**
|
||||||
|
- **VM 111 (docker-7days)** → **LXC 211** ✅ Successful
|
||||||
|
- **Migration Date**: 2025-01-12
|
||||||
|
- **Container Status**: Running and validated
|
||||||
|
- **Detailed Results**: See `wave1-migration-results.md`
|
||||||
|
|
||||||
|
## 🎯 Wave 2 Status: ✅ **COMPLETE**
|
||||||
|
- **VM 121 (docker-vpn)** → **LXC 221 (arr-stack)** ✅ Successful
|
||||||
|
- **Migration Date**: 2025-12-05
|
||||||
|
- **Container Status**: Running and validated
|
||||||
|
- **Key Changes**:
|
||||||
|
- Eliminated Mullvad VPN (Usenet + SSL is sufficient, no torrents)
|
||||||
|
- Replaced Overseerr with Jellyseerr (native Jellyfin support)
|
||||||
|
- Simplified stack: Sonarr, Radarr, Readarr, Jellyseerr, SABnzbd
|
||||||
|
- **Detailed Results**: See `wave2-migration-results.md`
|
||||||
|
|
||||||
|
## ✅ Confirmed Decisions
|
||||||
|
- **Networking**: Reuse existing IP addresses (transparent migration)
|
||||||
|
- **Storage**: Fresh install + volume copy for all Docker hosts
|
||||||
|
- **Timeline**: 4-6 weeks (updated from initial 6-8 based on Wave 1 experience)
|
||||||
|
- **GPU Services**: No GPU hardware available - Plex (107) and Tdarr (113) can migrate without special considerations
|
||||||
|
- **AppArmor Fix**: ALL docker-compose files need `security_opt: [apparmor=unconfined]` ⚠️ CRITICAL
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
Migrating services from full VMs to LXC containers on Proxmox to:
|
||||||
|
- Reduce resource overhead (memory, CPU, storage)
|
||||||
|
- Improve density and efficiency
|
||||||
|
- Faster provisioning and backup/restore
|
||||||
|
- Lower management complexity
|
||||||
|
|
||||||
|
**Current State**: 16 VMs (9 running, 7 stopped)
|
||||||
|
**Target State**: Strategic mix of LXC containers and VMs based on workload requirements
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1: Assessment & Categorization
|
||||||
|
|
||||||
|
### Current VM Inventory Analysis
|
||||||
|
|
||||||
|
#### Running Production VMs (9)
|
||||||
|
| VMID | Name | Service Type | Migration Candidate? | Priority | Notes |
|
||||||
|
|------|------|--------------|---------------------|----------|-------|
|
||||||
|
| 105 | docker-vpn | Docker Host | ✅ YES | HIGH | VPN routing considerations |
|
||||||
|
| 106 | docker-home | Docker Host | ✅ YES | HIGH | Critical home services |
|
||||||
|
| 107 | plex | Media Server | ✅ YES | MEDIUM | Software transcoding (no GPU hardware) |
|
||||||
|
| 109 | hass-io | Home Assistant | ❌ NO | N/A | HassOS requires VM, not standard Linux |
|
||||||
|
| 110 | discord-bots | Application | ✅ YES | MEDIUM | Simple Python services |
|
||||||
|
| 111 | docker-7days | Game Server | ✅ YES | HIGHEST | Lowest risk - migrate first |
|
||||||
|
| 112 | databases-bots | Database | ✅ YES | HIGH | PostgreSQL/databases |
|
||||||
|
| 113 | docker-tdarr | Transcode | ✅ YES | MEDIUM | Software transcoding (no GPU hardware) |
|
||||||
|
| 114 | docker-pittsburgh | Docker Host | ✅ YES | MEDIUM | Regional services |
|
||||||
|
| 115 | docker-sba | Docker Host | ✅ YES | MEDIUM | SBA baseball services |
|
||||||
|
| 116 | docker-home-servers | Docker Host | ✅ YES | HIGH | Critical infrastructure |
|
||||||
|
|
||||||
|
#### Stopped/Template VMs (7)
|
||||||
|
| VMID | Name | Purpose | Action |
|
||||||
|
|------|------|---------|--------|
|
||||||
|
| 100 | ubuntu-template | Template | KEEP as VM for flexibility |
|
||||||
|
| 101 | 7d-solo | Game Server | EVALUATE when needed |
|
||||||
|
| 102 | 7d-staci | Game Server | EVALUATE when needed |
|
||||||
|
| 103 | docker-template | Template | CONVERT to LXC template |
|
||||||
|
| 104 | 7d-wotw | Game Server | EVALUATE when needed |
|
||||||
|
| 117 | docker-unused | Unused | DELETE or ARCHIVE |
|
||||||
|
|
||||||
|
### Migration Suitability Matrix
|
||||||
|
|
||||||
|
#### ✅ **IDEAL for LXC** (All Migrate)
|
||||||
|
- **Game server - docker-7days (111)**: LOWEST RISK - Migrate first to validate process
|
||||||
|
- **Docker hosts** (105, 106, 114, 115, 116): Standard Docker workloads without special hardware
|
||||||
|
- **Application servers** (110): Discord bots, Python services
|
||||||
|
- **Database servers** (112): PostgreSQL, Redis, standard databases
|
||||||
|
- **Media servers** (107, 113): Plex and Tdarr using software transcoding (no GPU available)
|
||||||
|
- **Stopped game servers** (101, 102, 104): Migrate when needed
|
||||||
|
- **Docker template** (103): Convert to LXC template for faster provisioning
|
||||||
|
|
||||||
|
**Why**: No GPU hardware in system - all services can run in LXC without special considerations. Pure Linux workloads benefit from reduced overhead.
|
||||||
|
|
||||||
|
#### ❌ **KEEP as VM** (Do Not Migrate)
|
||||||
|
- **Home Assistant (109)**: HassOS is VM-optimized, not standard Linux
|
||||||
|
- **Ubuntu template (100)**: Keep VM flexibility for future VM deployments
|
||||||
|
|
||||||
|
**Why**: Technical incompatibility or strategic value as VM
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2: Technical Planning
|
||||||
|
|
||||||
|
### Service Consolidation Decision Framework
|
||||||
|
|
||||||
|
When deciding whether to keep services in separate LXCs or consolidate into a single LXC:
|
||||||
|
|
||||||
|
#### **Keep Separate** (1 LXC per service) when:
|
||||||
|
| Factor | Reason |
|
||||||
|
|--------|--------|
|
||||||
|
| **Blast radius** | Failure of one shouldn't take down others |
|
||||||
|
| **Different update cycles** | Services need independent maintenance windows |
|
||||||
|
| **Resource contention** | CPU/memory-hungry services that compete |
|
||||||
|
| **Security boundaries** | Different trust levels or network access needs |
|
||||||
|
| **Different owners/teams** | Separate accountability |
|
||||||
|
| **Databases** | Always isolate for backup/restore simplicity |
|
||||||
|
| **Critical infrastructure** | VPN, DNS, reverse proxy - high availability needs |
|
||||||
|
|
||||||
|
#### **Consolidate** (multiple services in 1 LXC) when:
|
||||||
|
| Factor | Reason |
|
||||||
|
|--------|--------|
|
||||||
|
| **Related services** | Naturally belong together (e.g., all SBA services) |
|
||||||
|
| **Low resource usage** | Services that barely use resources individually |
|
||||||
|
| **Same lifecycle** | Updated/restarted together anyway |
|
||||||
|
| **Shared dependencies** | Same database, same configs |
|
||||||
|
| **Simplicity wins** | Fewer LXCs to manage, backup, monitor |
|
||||||
|
| **Same project** | Discord bots for same league, microservices for same app |
|
||||||
|
|
||||||
|
#### Practical Examples:
|
||||||
|
|
||||||
|
| Keep Separate | Why |
|
||||||
|
|---------------|-----|
|
||||||
|
| Databases (112) | Backup/restore, data integrity |
|
||||||
|
| VPN (105) | Security boundary, networking critical |
|
||||||
|
| Critical home services (106) | High availability |
|
||||||
|
| n8n (210) | Workflow automation, independent maintenance |
|
||||||
|
|
||||||
|
| Candidate for Consolidation | Why |
|
||||||
|
|-----------------------------|-----|
|
||||||
|
| Discord bots + related API services | Same project, low resources, same maintainer |
|
||||||
|
| Multiple low-traffic web apps | Minimal resource usage |
|
||||||
|
| Dev/test environments | Non-critical, shared lifecycle |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### LXC vs VM Decision Criteria
|
||||||
|
|
||||||
|
| Criteria | LXC Container | Full VM | Notes |
|
||||||
|
|----------|--------------|---------|-------|
|
||||||
|
| **OS Type** | Linux only | Any OS | LXC shares host kernel |
|
||||||
|
| **Resource Overhead** | Minimal (~50-200MB RAM) | High (full OS stack) | LXC 5-10x more efficient |
|
||||||
|
| **Boot Time** | 1-5 seconds | 30-90 seconds | Near-instant container start |
|
||||||
|
| **Kernel Modules** | Shared host kernel | Own kernel | LXC cannot load custom modules |
|
||||||
|
| **Hardware Passthrough** | Limited (requires privileges) | Full passthrough | GPU/USB may need testing |
|
||||||
|
| **Nested Virtualization** | Not supported | Supported | Cannot run Docker-in-Docker easily |
|
||||||
|
| **Backup/Restore** | Very fast | Slower | Container backups are incremental |
|
||||||
|
| **Disk Performance** | Native | Near-native | Both excellent on modern storage |
|
||||||
|
|
||||||
|
### Key Technical Decisions
|
||||||
|
|
||||||
|
#### 1. **Networking Strategy** ✅ CONFIRMED
|
||||||
|
**Decision**: Reuse existing IP addresses
|
||||||
|
|
||||||
|
**Implementation**:
|
||||||
|
- ✅ No DNS changes required
|
||||||
|
- ✅ Existing firewall rules work
|
||||||
|
- ✅ Monitoring continues without changes
|
||||||
|
- ✅ Transparent migration for users
|
||||||
|
- ⚠️ Requires careful IP conflict management during parallel running
|
||||||
|
|
||||||
|
**Migration Process**:
|
||||||
|
1. Build LXC with temporary IP (or offline)
|
||||||
|
2. Test and validate LXC functionality
|
||||||
|
3. Stop VM during maintenance window
|
||||||
|
4. Reconfigure LXC to production IP
|
||||||
|
5. Start LXC and validate
|
||||||
|
6. Keep VM stopped for 48hr rollback window
|
||||||
|
|
||||||
|
#### 2. **Storage Strategy** ✅ CONFIRMED
|
||||||
|
**Decision**: Fresh install + volume copy for all Docker hosts
|
||||||
|
|
||||||
|
**Implementation for Docker Hosts**:
|
||||||
|
1. **Fresh LXC installation**:
|
||||||
|
- Clean Ubuntu 22.04 LTS base
|
||||||
|
- Install Docker via standard script
|
||||||
|
- Install docker-compose plugin
|
||||||
|
- No migration of system configs
|
||||||
|
|
||||||
|
2. **Volume migration**:
|
||||||
|
- Copy `/var/lib/docker/volumes/` from VM to LXC
|
||||||
|
- Copy docker-compose files from VM to LXC
|
||||||
|
- Copy environment files (.env) if applicable
|
||||||
|
- Validate volume data integrity
|
||||||
|
|
||||||
|
**Benefits**:
|
||||||
|
- ✅ Clean configuration, no cruft
|
||||||
|
- ✅ Opportunity to update/standardize configs
|
||||||
|
- ✅ Smaller container images
|
||||||
|
- ✅ Document infrastructure-as-code
|
||||||
|
- ✅ Latest Docker version on fresh install
|
||||||
|
|
||||||
|
#### 3. **Docker in LXC** ✅ CONFIRMED
|
||||||
|
**Decision**: Privileged LXC containers for all Docker hosts
|
||||||
|
|
||||||
|
**Configuration**:
|
||||||
|
- Set `--unprivileged 0` (privileged mode)
|
||||||
|
- Enable nesting: `--features nesting=1,keyctl=1`
|
||||||
|
- Docker works without issues
|
||||||
|
- All Docker features supported
|
||||||
|
- No complex UID mapping required
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- ✅ Docker compatibility guaranteed
|
||||||
|
- ✅ Simpler configuration and troubleshooting
|
||||||
|
- ✅ Balanced approach for home lab environment
|
||||||
|
- ⚠️ Acceptable security trade-off for isolated home network
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3: Migration Strategy
|
||||||
|
|
||||||
|
### Phased Rollout Approach (Risk-Based Ordering)
|
||||||
|
|
||||||
|
#### **Wave 1: Lowest Risk - Game Server** (Week 1)
|
||||||
|
**Target**: Lowest-risk service to validate entire migration process
|
||||||
|
|
||||||
|
1. **docker-7days (111)** - Game server via Docker, lowest impact if issues occur
|
||||||
|
|
||||||
|
**Why This First**:
|
||||||
|
- ✅ Non-critical service (gaming only)
|
||||||
|
- ✅ Can migrate during off-hours when not in use
|
||||||
|
- ✅ Clear validation criteria (game server starts and runs)
|
||||||
|
- ✅ Builds confidence in process with minimal risk
|
||||||
|
- ✅ Tests Docker-in-LXC configuration end-to-end
|
||||||
|
|
||||||
|
**Success Criteria**:
|
||||||
|
- Game server accessible and playable
|
||||||
|
- Docker containers running stable for 48+ hours
|
||||||
|
- Backup/restore tested successfully
|
||||||
|
- Rollback procedure validated
|
||||||
|
- Process documented for next waves
|
||||||
|
|
||||||
|
#### **Wave 2: Docker Hosts - Regional/Isolated** (Week 1-2)
|
||||||
|
**Target**: Docker hosts with lower criticality and good isolation
|
||||||
|
|
||||||
|
2. **docker-pittsburgh (114)** - Regional services, lower criticality
|
||||||
|
3. **docker-vpn (105)** - VPN routing (isolated workload)
|
||||||
|
|
||||||
|
**Prerequisites**:
|
||||||
|
- Wave 1 successful (docker-7days stable)
|
||||||
|
- Process refined based on learnings
|
||||||
|
- Confidence in Docker-in-LXC configuration
|
||||||
|
|
||||||
|
**Validation Points**:
|
||||||
|
- VPN routing works correctly (105)
|
||||||
|
- Regional services accessible (114)
|
||||||
|
- No cross-service impact
|
||||||
|
|
||||||
|
#### **Wave 3: Additional Docker Hosts** (Week 2-3)
|
||||||
|
**Target**: More Docker infrastructure, increasing criticality
|
||||||
|
|
||||||
|
4. **docker-sba (115)** - Baseball services (defined maintenance windows)
|
||||||
|
5. **docker-unused (117)** - Migrate or decommission
|
||||||
|
6. **docker-home-servers (116)** - Home server infrastructure
|
||||||
|
|
||||||
|
**Critical Considerations**:
|
||||||
|
- SBA has known maintenance windows - use those
|
||||||
|
- docker-home-servers may have dependencies - validate carefully
|
||||||
|
- docker-unused can be decommissioned if no longer needed
|
||||||
|
|
||||||
|
#### **Wave 4: Application & Database Servers** (Week 3-4)
|
||||||
|
**Target**: Non-Docker services requiring extra care
|
||||||
|
|
||||||
|
7. **discord-bots (110)** - Python services, straightforward
|
||||||
|
8. **databases-bots (112)** - PostgreSQL/databases (highest care required)
|
||||||
|
|
||||||
|
**Critical Steps for Databases**:
|
||||||
|
- ⚠️ Full database backup before migration
|
||||||
|
- ⚠️ Validate connection strings from all dependent services
|
||||||
|
- ⚠️ Test database performance in LXC thoroughly
|
||||||
|
- ⚠️ Monitor for 48+ hours before decommissioning VM
|
||||||
|
- ⚠️ Have rollback plan ready and tested
|
||||||
|
|
||||||
|
#### **Wave 5: Media Services** ~~(Week 4-5)~~ **SKIPPED**
|
||||||
|
**Status**: ❌ SKIPPED - Services retired or decommissioned
|
||||||
|
|
||||||
|
~~9. **docker-tdarr (113)**~~ - **RETIRED**: Tdarr moved to dedicated GPU server (ubuntu-manticore)
|
||||||
|
~~10. **plex (107)**~~ - **DECOMMISSIONING**: Plex being retired, no migration needed
|
||||||
|
|
||||||
|
**Notes**:
|
||||||
|
- Tdarr now runs on ubuntu-manticore (10.10.0.226) with GPU transcoding
|
||||||
|
- Plex scheduled for decommission - Jellyfin is the replacement
|
||||||
|
|
||||||
|
#### **Wave 6: Final Critical Infrastructure** (Week 5-6)
|
||||||
|
**Target**: Most critical Docker infrastructure (save for last)
|
||||||
|
|
||||||
|
11. **docker-home (106)** - Critical home services (highest risk)
|
||||||
|
|
||||||
|
**Why Last**:
|
||||||
|
- Most critical infrastructure
|
||||||
|
- All other waves provide confidence
|
||||||
|
- Process fully refined and validated
|
||||||
|
- All potential issues already encountered and resolved
|
||||||
|
|
||||||
|
**Do NOT Migrate**:
|
||||||
|
- **hass-io (109)** - Keep as VM (HassOS requirement)
|
||||||
|
- **ubuntu-template (100)** - Keep as VM (strategic flexibility)
|
||||||
|
|
||||||
|
### Parallel Running Strategy
|
||||||
|
|
||||||
|
**For Each Migration**:
|
||||||
|
|
||||||
|
1. **Build LXC container** (new ID, temporary IP or offline)
|
||||||
|
2. **Configure and test** (validate all functionality)
|
||||||
|
3. **Sync data** from VM to LXC (while VM still running)
|
||||||
|
4. **Maintenance window**:
|
||||||
|
- Stop VM
|
||||||
|
- Final data sync
|
||||||
|
- Change LXC to production IP
|
||||||
|
- Start LXC
|
||||||
|
- Validate services
|
||||||
|
5. **Monitor for 24-48 hours** (VM kept in stopped state)
|
||||||
|
6. **Decommission VM** after confidence period
|
||||||
|
|
||||||
|
**Rollback Procedure**:
|
||||||
|
- Stop LXC
|
||||||
|
- Start VM (already has data up to cutover point)
|
||||||
|
- Resume production on VM
|
||||||
|
- Document what failed for retry
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4: Implementation Checklist
|
||||||
|
|
||||||
|
### Pre-Migration (Per Service)
|
||||||
|
|
||||||
|
- [ ] Document current VM configuration
|
||||||
|
- [ ] CPU, memory, storage allocation
|
||||||
|
- [ ] Network configuration (IP, gateway, DNS)
|
||||||
|
- [ ] Installed packages and services
|
||||||
|
- [ ] Docker compose files (if Docker host)
|
||||||
|
- [ ] Volume mounts and storage locations
|
||||||
|
- [ ] Environment variables and secrets
|
||||||
|
- [ ] Cron jobs and systemd services
|
||||||
|
|
||||||
|
- [ ] Create LXC container
|
||||||
|
- [ ] Select appropriate template (Ubuntu 22.04 LTS recommended)
|
||||||
|
- [ ] Allocate resources (start conservative, can increase)
|
||||||
|
- [ ] Configure networking (temporary IP for testing)
|
||||||
|
- [ ] Set privileged mode if Docker host
|
||||||
|
- [ ] Configure storage (bind mounts for data volumes)
|
||||||
|
|
||||||
|
- [ ] Prepare migration scripts
|
||||||
|
- [ ] Data sync script (rsync-based)
|
||||||
|
- [ ] Configuration export/import
|
||||||
|
- [ ] Service validation tests
|
||||||
|
|
||||||
|
- [ ] Backup current VM
|
||||||
|
- [ ] Full VM backup in Proxmox
|
||||||
|
- [ ] Export critical data separately
|
||||||
|
- [ ] Document backup location and restore procedure
|
||||||
|
|
||||||
|
### During Migration
|
||||||
|
|
||||||
|
- [ ] Announce maintenance window (if user-facing)
|
||||||
|
- [ ] Stop services on VM (or entire VM)
|
||||||
|
- [ ] Perform final data sync to LXC
|
||||||
|
- [ ] Update DNS/networking (if using new IP temporarily)
|
||||||
|
- [ ] Start services in LXC
|
||||||
|
- [ ] Run validation tests
|
||||||
|
- [ ] Service responding?
|
||||||
|
- [ ] Data accessible?
|
||||||
|
- [ ] External connectivity working?
|
||||||
|
- [ ] Dependent services connecting successfully?
|
||||||
|
- [ ] Performance acceptable?
|
||||||
|
|
||||||
|
### Post-Migration
|
||||||
|
|
||||||
|
- [ ] Monitor for 24 hours
|
||||||
|
- [ ] Check logs for errors
|
||||||
|
- [ ] Monitor resource usage
|
||||||
|
- [ ] Validate backups working
|
||||||
|
- [ ] Test restore procedure
|
||||||
|
|
||||||
|
- [ ] Update documentation
|
||||||
|
- [ ] Update VM inventory
|
||||||
|
- [ ] Document new container configuration
|
||||||
|
- [ ] Update monitoring configs
|
||||||
|
- [ ] Update runbooks/procedures
|
||||||
|
|
||||||
|
- [ ] After 48-hour success period
|
||||||
|
- [ ] Backup LXC container
|
||||||
|
- [ ] Delete VM backup (or archive)
|
||||||
|
- [ ] Destroy original VM
|
||||||
|
- [ ] Update network documentation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5: Technical Implementation Details
|
||||||
|
|
||||||
|
### Standard LXC Container Creation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create privileged LXC container for Docker host
|
||||||
|
pct create 205 local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||||||
|
--hostname docker-home-lxc \
|
||||||
|
--memory 4096 \
|
||||||
|
--cores 2 \
|
||||||
|
--net0 name=eth0,bridge=vmbr0,ip=10.10.0.106/24,gw=10.10.0.1 \
|
||||||
|
--storage local-lvm \
|
||||||
|
--rootfs local-lvm:32 \
|
||||||
|
--unprivileged 0 \
|
||||||
|
--features nesting=1,keyctl=1
|
||||||
|
|
||||||
|
# Start container
|
||||||
|
pct start 205
|
||||||
|
|
||||||
|
# Enter container
|
||||||
|
pct enter 205
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker Installation in LXC
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Inside LXC container
|
||||||
|
# Update system
|
||||||
|
apt update && apt upgrade -y
|
||||||
|
|
||||||
|
# Install Docker
|
||||||
|
curl -fsSL https://get.docker.com -o get-docker.sh
|
||||||
|
sh get-docker.sh
|
||||||
|
|
||||||
|
# Install Docker Compose
|
||||||
|
apt install docker-compose-plugin -y
|
||||||
|
|
||||||
|
# Verify
|
||||||
|
docker --version
|
||||||
|
docker compose version
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Migration Script Template
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# migrate-docker-host.sh
|
||||||
|
|
||||||
|
VM_IP="10.10.0.106"
|
||||||
|
LXC_IP="10.10.0.206" # Temporary during migration
|
||||||
|
VM_DATA="/var/lib/docker"
|
||||||
|
LXC_DATA="/var/lib/docker"
|
||||||
|
|
||||||
|
# Sync Docker volumes (while VM still running for initial sync)
|
||||||
|
rsync -avz --progress \
|
||||||
|
root@${VM_IP}:${VM_DATA}/ \
|
||||||
|
root@${LXC_IP}:${LXC_DATA}/
|
||||||
|
|
||||||
|
# Sync docker-compose files
|
||||||
|
rsync -avz --progress \
|
||||||
|
root@${VM_IP}:/opt/docker/ \
|
||||||
|
root@${LXC_IP}:/opt/docker/
|
||||||
|
|
||||||
|
# Sync environment files
|
||||||
|
rsync -avz --progress \
|
||||||
|
root@${VM_IP}:/root/.env \
|
||||||
|
root@${LXC_IP}:/root/.env
|
||||||
|
|
||||||
|
echo "Initial sync complete. Ready for cutover."
|
||||||
|
```
|
||||||
|
|
||||||
|
### Service Validation Script
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# validate-migration.sh
|
||||||
|
|
||||||
|
CONTAINER_IP="$1"
|
||||||
|
SERVICE_TYPE="$2"
|
||||||
|
|
||||||
|
echo "Validating migration for ${SERVICE_TYPE} at ${CONTAINER_IP}..."
|
||||||
|
|
||||||
|
case $SERVICE_TYPE in
|
||||||
|
docker)
|
||||||
|
# Check Docker is running
|
||||||
|
ssh root@${CONTAINER_IP} "docker ps" || exit 1
|
||||||
|
|
||||||
|
# Check compose services
|
||||||
|
ssh root@${CONTAINER_IP} "cd /opt/docker && docker compose ps" || exit 1
|
||||||
|
|
||||||
|
echo "✅ Docker services validated"
|
||||||
|
;;
|
||||||
|
|
||||||
|
database)
|
||||||
|
# Check PostgreSQL
|
||||||
|
ssh root@${CONTAINER_IP} "systemctl status postgresql" || exit 1
|
||||||
|
|
||||||
|
# Test connection
|
||||||
|
ssh root@${CONTAINER_IP} "sudo -u postgres psql -c 'SELECT version();'" || exit 1
|
||||||
|
|
||||||
|
echo "✅ Database validated"
|
||||||
|
;;
|
||||||
|
|
||||||
|
web)
|
||||||
|
# Check HTTP response
|
||||||
|
curl -f http://${CONTAINER_IP} || exit 1
|
||||||
|
|
||||||
|
echo "✅ Web service validated"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
echo "✅ All validation checks passed!"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6: Risk Management
|
||||||
|
|
||||||
|
### Risk Assessment
|
||||||
|
|
||||||
|
| Risk | Likelihood | Impact | Mitigation |
|
||||||
|
|------|-----------|--------|------------|
|
||||||
|
| Service downtime during migration | HIGH | MEDIUM | Off-hours migration, parallel running, fast rollback |
|
||||||
|
| Data loss during sync | LOW | HIGH | Multiple backups, checksums, validation |
|
||||||
|
| GPU passthrough failure | MEDIUM | MEDIUM | Test first, keep VMs as fallback |
|
||||||
|
| Performance degradation | LOW | MEDIUM | Monitor closely, can revert easily |
|
||||||
|
| Networking issues | MEDIUM | HIGH | Keep VM stopped but intact for rollback |
|
||||||
|
| Forgotten dependencies | MEDIUM | HIGH | Document thoroughly, test before cutover |
|
||||||
|
|
||||||
|
### Rollback Procedures
|
||||||
|
|
||||||
|
#### Quick Rollback (During Cutover)
|
||||||
|
```bash
|
||||||
|
# If migration fails during cutover window
|
||||||
|
pct stop 205 # Stop new LXC
|
||||||
|
qm start 106 # Start original VM
|
||||||
|
# Service restored in <2 minutes
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Rollback After Migration
|
||||||
|
```bash
|
||||||
|
# If issues discovered post-migration
|
||||||
|
pct stop 205 # Stop LXC
|
||||||
|
qm start 106 # Start original VM
|
||||||
|
qm restore 106 backup-file.vma.zst # If needed
|
||||||
|
# May need to sync recent data from LXC to VM
|
||||||
|
```
|
||||||
|
|
||||||
|
### Success Metrics
|
||||||
|
|
||||||
|
**Per-Service Success Criteria**:
|
||||||
|
- Service uptime: 99.9% after 48 hours
|
||||||
|
- Response time: Same or better than VM
|
||||||
|
- Resource usage: 30-50% reduction in RAM usage
|
||||||
|
- No errors in logs
|
||||||
|
- Backups completing successfully
|
||||||
|
- Dependent services connecting properly
|
||||||
|
|
||||||
|
**Overall Migration Success**:
|
||||||
|
- 80%+ of suitable VMs migrated to LXC
|
||||||
|
- Zero data loss incidents
|
||||||
|
- Total downtime <4 hours across all migrations
|
||||||
|
- Documentation complete and validated
|
||||||
|
- Team confident in managing LXC infrastructure
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 7: Resource Planning
|
||||||
|
|
||||||
|
### Expected Resource Gains
|
||||||
|
|
||||||
|
**Current VM Resource Usage** (estimated):
|
||||||
|
- 9 running VMs × 2GB average overhead = ~18GB RAM overhead
|
||||||
|
- 9 running VMs × 500MB average storage overhead = ~4.5GB storage
|
||||||
|
|
||||||
|
**Post-Migration LXC Resource Usage** (estimated):
|
||||||
|
- 7-8 LXC containers × 100MB average overhead = ~800MB RAM overhead
|
||||||
|
- 7-8 LXC containers × 100MB average storage overhead = ~800MB storage
|
||||||
|
|
||||||
|
**Net Gain**:
|
||||||
|
- ~17GB RAM freed (can support 17 more LXC containers or larger workloads)
|
||||||
|
- ~3.7GB storage freed
|
||||||
|
- Faster backup/restore times (5-10x improvement)
|
||||||
|
- Faster provisioning (minutes vs hours)
|
||||||
|
|
||||||
|
### Resource Allocation Strategy
|
||||||
|
|
||||||
|
**Conservative Approach** (Recommended for initial migration):
|
||||||
|
- Allocate **same resources as VM** to LXC initially
|
||||||
|
- Monitor usage for 1-2 weeks
|
||||||
|
- Right-size after baseline established
|
||||||
|
- Iterate and optimize
|
||||||
|
|
||||||
|
**Example**: VM with 4GB RAM, 2 cores
|
||||||
|
- LXC Initial: 4GB RAM, 2 cores
|
||||||
|
- After monitoring: Adjust to 2GB RAM, 2 cores (if appropriate)
|
||||||
|
- Freed resources: 2GB RAM for other uses
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 8: Documentation & Knowledge Transfer
|
||||||
|
|
||||||
|
### Required Documentation Updates
|
||||||
|
|
||||||
|
- [ ] **VM Inventory** → **LXC Inventory**
|
||||||
|
- Update VMID mappings
|
||||||
|
- Update IP addresses (if changed)
|
||||||
|
- Update resource allocations
|
||||||
|
|
||||||
|
- [ ] **Runbooks**
|
||||||
|
- Update operational procedures for LXC
|
||||||
|
- Document `pct` commands vs `qm` commands
|
||||||
|
- Update backup/restore procedures
|
||||||
|
|
||||||
|
- [ ] **Monitoring**
|
||||||
|
- Update monitoring configs for LXC IDs
|
||||||
|
- Verify alerts still firing correctly
|
||||||
|
- Update dashboards
|
||||||
|
|
||||||
|
- [ ] **Troubleshooting Guide**
|
||||||
|
- Common LXC issues and solutions
|
||||||
|
- Docker in LXC quirks
|
||||||
|
- Performance tuning tips
|
||||||
|
- Software transcoding optimization (Plex/Tdarr)
|
||||||
|
|
||||||
|
### Key Differences: VM vs LXC Operations
|
||||||
|
|
||||||
|
| Operation | VM Command | LXC Command |
|
||||||
|
|-----------|-----------|-------------|
|
||||||
|
| List | `qm list` | `pct list` |
|
||||||
|
| Start | `qm start 106` | `pct start 206` |
|
||||||
|
| Stop | `qm stop 106` | `pct stop 206` |
|
||||||
|
| Enter console | `qm terminal 106` | `pct enter 206` |
|
||||||
|
| Create | `qm create ...` | `pct create ...` |
|
||||||
|
| Backup | `vzdump 106` | `vzdump 206` |
|
||||||
|
| Restore | `qm restore ...` | `pct restore ...` |
|
||||||
|
| Config | `/etc/pve/qemu-server/106.conf` | `/etc/pve/lxc/206.conf` |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 9: Timeline & Milestones
|
||||||
|
|
||||||
|
### Proposed Timeline (4-6 Weeks - Likely to Accelerate)
|
||||||
|
|
||||||
|
**Week 1: Wave 1 - Lowest Risk**
|
||||||
|
- Day 1-2: Build and migrate docker-7days (111)
|
||||||
|
- Day 3-7: Monitor and validate - if stable, proceed immediately
|
||||||
|
|
||||||
|
**Week 1-2: Wave 2 - Regional/Isolated Docker Hosts**
|
||||||
|
- Day 5-6: Migrate docker-pittsburgh (114)
|
||||||
|
- Day 7-8: Migrate docker-vpn (105)
|
||||||
|
- Day 9-14: Monitor both services
|
||||||
|
|
||||||
|
**Week 2-3: Wave 3 - Additional Docker Hosts**
|
||||||
|
- Day 10-11: Migrate docker-sba (115)
|
||||||
|
- Day 12-13: Migrate docker-unused (117) or decommission
|
||||||
|
- Day 14-15: Migrate docker-home-servers (116)
|
||||||
|
- Day 16-21: Monitor all Wave 3 services
|
||||||
|
|
||||||
|
**Week 3-4: Wave 4 - Application & Database Servers**
|
||||||
|
- Day 17-18: Migrate discord-bots (110)
|
||||||
|
- Day 19-20: Migrate databases-bots (112) - EXTRA CARE
|
||||||
|
- Day 21-28: Extended monitoring for database migration
|
||||||
|
|
||||||
|
**Week 4-5: Wave 5 - Media Services**
|
||||||
|
- Day 22-23: Migrate docker-tdarr (113)
|
||||||
|
- Day 24-25: Migrate plex (107)
|
||||||
|
- Day 26-35: Monitor transcoding performance and CPU usage
|
||||||
|
|
||||||
|
**Week 5-6: Wave 6 - Final Critical Infrastructure**
|
||||||
|
- Day 29-30: Migrate docker-home (106) - Most critical
|
||||||
|
- Day 31-42: Extended monitoring and final optimization
|
||||||
|
|
||||||
|
**Post-Migration: Cleanup & Optimization**
|
||||||
|
- Resource optimization (right-sizing containers)
|
||||||
|
- Documentation finalization
|
||||||
|
- Final VM decommissioning after confidence period
|
||||||
|
|
||||||
|
**Note**: Timeline likely to accelerate based on success and comfort level. Waves may overlap if previous waves are stable ahead of schedule.
|
||||||
|
|
||||||
|
### Decision Gates
|
||||||
|
|
||||||
|
**Gate 1 (After Wave 1)**: docker-7days Success
|
||||||
|
- ✅ Game server stable and playable → Proceed to Wave 2
|
||||||
|
- ❌ Issues encountered → Pause, troubleshoot, refine process
|
||||||
|
|
||||||
|
**Gate 2 (After Wave 2)**: Regional Docker Hosts Success
|
||||||
|
- ✅ VPN routing working, regional services stable → Proceed to Wave 3
|
||||||
|
- ❌ Critical issues → Pause and reassess approach
|
||||||
|
|
||||||
|
**Gate 3 (After Wave 3)**: Docker Infrastructure Success
|
||||||
|
- ✅ All Docker hosts stable → Proceed to Wave 4
|
||||||
|
- ❌ Issues → Pause, may need to adjust LXC configuration
|
||||||
|
|
||||||
|
**Gate 4 (After Wave 4)**: Database Migration Success
|
||||||
|
- ✅ Database performance acceptable, no data issues → Proceed to Wave 5
|
||||||
|
- ❌ Database performance issues → Investigate before proceeding
|
||||||
|
|
||||||
|
**Gate 5 (After Wave 5)**: Media Services Success
|
||||||
|
- ✅ Software transcoding performance acceptable → Proceed to Wave 6
|
||||||
|
- ❌ Transcoding too CPU-intensive → May need resource adjustment or keep as VMs
|
||||||
|
|
||||||
|
**Gate 6 (After Wave 6)**: Final Critical Service Success
|
||||||
|
- ✅ docker-home stable → Begin cleanup and decommissioning
|
||||||
|
- ❌ Issues → Rollback and reassess
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 10: Post-Migration Operations
|
||||||
|
|
||||||
|
### Ongoing Management
|
||||||
|
|
||||||
|
**Monthly Tasks**:
|
||||||
|
- Review resource utilization and right-size containers
|
||||||
|
- Validate backup/restore procedures
|
||||||
|
- Check for LXC template updates
|
||||||
|
- Review and update documentation
|
||||||
|
|
||||||
|
**Quarterly Tasks**:
|
||||||
|
- Evaluate new services for LXC vs VM placement
|
||||||
|
- Performance benchmarking
|
||||||
|
- Disaster recovery drill
|
||||||
|
- Capacity planning review
|
||||||
|
|
||||||
|
### Continuous Improvement
|
||||||
|
|
||||||
|
**Optimization Opportunities**:
|
||||||
|
- Standardize LXC templates with common tooling
|
||||||
|
- Automate container provisioning (Terraform/Ansible)
|
||||||
|
- Implement infrastructure-as-code for configs
|
||||||
|
- Build CI/CD for container updates
|
||||||
|
|
||||||
|
**Future Considerations**:
|
||||||
|
- Evaluate Proxmox clustering for HA
|
||||||
|
- Consider container orchestration (Kubernetes) if container count grows
|
||||||
|
- Explore automated resource balancing
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix A: LXC Container ID Mapping
|
||||||
|
|
||||||
|
**Proposed New Container IDs** (200-series for LXC):
|
||||||
|
|
||||||
|
| Wave | VM ID | VM Name | New LXC ID | LXC Name | Migration Priority |
|
||||||
|
|------|-------|---------|-----------|----------|-------------------|
|
||||||
|
| 1 | 111 | docker-7days | 211 | docker-7days-lxc | FIRST - Lowest risk validation |
|
||||||
|
| 2 | 114 | docker-pittsburgh | 214 | docker-pittsburgh-lxc | Regional/isolated |
|
||||||
|
| 2 | 121 | docker-vpn | 221 | arr-stack | ✅ COMPLETE - VPN eliminated, simplified to arr stack |
|
||||||
|
| 3 | 115 | docker-sba | 215 | docker-sba-lxc | Additional Docker hosts |
|
||||||
|
| 3 | 117 | docker-unused | 217 | docker-unused-lxc | Migrate or decommission |
|
||||||
|
| 3 | 116 | docker-home-servers | 216 | docker-home-servers-lxc | Additional Docker hosts |
|
||||||
|
| 4 | 110 | discord-bots | 210 | discord-bots-lxc | Application servers |
|
||||||
|
| 4 | 112 | databases-bots | 212 | databases-bots-lxc | Database (EXTRA CARE) |
|
||||||
|
| ~~5~~ | ~~113~~ | ~~docker-tdarr~~ | ~~213~~ | ~~docker-tdarr-lxc~~ | ❌ RETIRED - moved to GPU server |
|
||||||
|
| ~~5~~ | ~~107~~ | ~~plex~~ | ~~207~~ | ~~plex-lxc~~ | ❌ DECOMMISSIONING - replaced by Jellyfin |
|
||||||
|
| 6 | 106 | docker-home | 206 | docker-home-lxc | FINAL - Most critical |
|
||||||
|
|
||||||
|
**Keep as VM**:
|
||||||
|
- 109 (hass-io) - HassOS requirement
|
||||||
|
- 100 (ubuntu-template) - Strategic VM template
|
||||||
|
- 103 (docker-template) - Convert to LXC template eventually
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix B: Quick Reference Commands
|
||||||
|
|
||||||
|
### Create Standard Docker LXC
|
||||||
|
```bash
|
||||||
|
pct create 2XX local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||||||
|
--hostname NAME \
|
||||||
|
--memory 4096 \
|
||||||
|
--cores 2 \
|
||||||
|
--net0 name=eth0,bridge=vmbr0,ip=10.10.0.XX/24,gw=10.10.0.1 \
|
||||||
|
--storage local-lvm \
|
||||||
|
--rootfs local-lvm:32 \
|
||||||
|
--unprivileged 0 \
|
||||||
|
--features nesting=1,keyctl=1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Sync During Migration
|
||||||
|
```bash
|
||||||
|
# Initial sync (while VM running)
|
||||||
|
rsync -avz --progress root@VM_IP:/data/ root@LXC_IP:/data/
|
||||||
|
|
||||||
|
# Final sync (VM stopped)
|
||||||
|
rsync -avz --progress --delete root@VM_IP:/data/ root@LXC_IP:/data/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quick Validation
|
||||||
|
```bash
|
||||||
|
# Check LXC is running
|
||||||
|
pct status 2XX
|
||||||
|
|
||||||
|
# Check services inside
|
||||||
|
pct enter 2XX
|
||||||
|
systemctl status docker
|
||||||
|
docker ps
|
||||||
|
exit
|
||||||
|
|
||||||
|
# Network connectivity
|
||||||
|
ping -c 3 10.10.0.2XX
|
||||||
|
curl -f http://10.10.0.2XX
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix C: Contact & Escalation
|
||||||
|
|
||||||
|
**Migration Owner**: Cal Corum (cal.corum@gmail.com)
|
||||||
|
|
||||||
|
**Key Resources**:
|
||||||
|
- Proxmox skill: `~/.claude/skills/proxmox/`
|
||||||
|
- VM management docs: `/mnt/NV2/Development/claude-home/vm-management/`
|
||||||
|
- Proxmox API: `~/.claude/skills/proxmox/proxmox_client.py`
|
||||||
|
|
||||||
|
**Support Channels**:
|
||||||
|
- Proxmox forums: https://forum.proxmox.com/
|
||||||
|
- LXC documentation: https://linuxcontainers.org/
|
||||||
|
- Docker in LXC: https://forum.proxmox.com/threads/docker-in-lxc.38129/
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Next Steps**:
|
||||||
|
1. ✅ Migration plan approved with confirmed decisions
|
||||||
|
2. Schedule Wave 1 migration window for docker-7days (111)
|
||||||
|
3. Build first LXC container for docker-7days
|
||||||
|
4. Execute Wave 1 migration and validate process
|
||||||
|
|
||||||
|
**Document Version**: 2.0 (Approved)
|
||||||
|
**Last Updated**: 2025-01-12
|
||||||
|
**Status**: Approved & Ready for Execution
|
||||||
129
vm-management/migration-quick-start.md
Normal file
129
vm-management/migration-quick-start.md
Normal file
@ -0,0 +1,129 @@
|
|||||||
|
# VM to LXC Migration - Quick Start Guide
|
||||||
|
|
||||||
|
**Status**: Approved & Ready for Execution
|
||||||
|
**Last Updated**: 2025-01-12
|
||||||
|
|
||||||
|
## ✅ Confirmed Decisions
|
||||||
|
- **Networking**: Reuse existing IP addresses
|
||||||
|
- **Storage**: Fresh install + volume copy for Docker hosts
|
||||||
|
- **Timeline**: 4-6 weeks (expected to accelerate)
|
||||||
|
- **GPU**: No GPU hardware - all services can migrate
|
||||||
|
|
||||||
|
## Migration Order (Risk-Based)
|
||||||
|
|
||||||
|
### Wave 1: docker-7days (111) - LOWEST RISK
|
||||||
|
**Goal**: Validate entire migration process
|
||||||
|
- Non-critical game server
|
||||||
|
- Docker-in-LXC test
|
||||||
|
- Build confidence
|
||||||
|
|
||||||
|
### Wave 2: docker-pittsburgh (114) + docker-vpn (105)
|
||||||
|
**Goal**: Regional/isolated Docker hosts
|
||||||
|
- Test VPN routing
|
||||||
|
- Regional services validation
|
||||||
|
|
||||||
|
### Wave 3: docker-sba (115) + docker-unused (117) + docker-home-servers (116)
|
||||||
|
**Goal**: Additional Docker infrastructure
|
||||||
|
- Use SBA maintenance windows
|
||||||
|
- Decommission unused if appropriate
|
||||||
|
|
||||||
|
### Wave 4: discord-bots (110) + databases-bots (112)
|
||||||
|
**Goal**: Application & database servers
|
||||||
|
- ⚠️ EXTRA CARE for database migration
|
||||||
|
- Full backups required
|
||||||
|
|
||||||
|
### Wave 5: docker-tdarr (113) + plex (107)
|
||||||
|
**Goal**: Media services (software transcoding)
|
||||||
|
- Monitor CPU usage
|
||||||
|
- Validate transcode performance
|
||||||
|
|
||||||
|
### Wave 6: docker-home (106) - MOST CRITICAL
|
||||||
|
**Goal**: Final critical infrastructure
|
||||||
|
- Migrate last after all confidence built
|
||||||
|
- Most important home services
|
||||||
|
|
||||||
|
## Keep as VMs
|
||||||
|
- **hass-io (109)**: HassOS requirement
|
||||||
|
- **ubuntu-template (100)**: Strategic flexibility
|
||||||
|
|
||||||
|
## LXC Container IDs (200-series)
|
||||||
|
|
||||||
|
| VM → LXC | Service | Wave |
|
||||||
|
|----------|---------|------|
|
||||||
|
| 111 → 211 | docker-7days | 1 |
|
||||||
|
| 114 → 214 | docker-pittsburgh | 2 |
|
||||||
|
| 105 → 205 | docker-vpn | 2 |
|
||||||
|
| 115 → 215 | docker-sba | 3 |
|
||||||
|
| 117 → 217 | docker-unused | 3 |
|
||||||
|
| 116 → 216 | docker-home-servers | 3 |
|
||||||
|
| 110 → 210 | discord-bots | 4 |
|
||||||
|
| 112 → 212 | databases-bots | 4 |
|
||||||
|
| 113 → 213 | docker-tdarr | 5 |
|
||||||
|
| 107 → 207 | plex | 5 |
|
||||||
|
| 106 → 206 | docker-home | 6 |
|
||||||
|
|
||||||
|
## Quick Commands
|
||||||
|
|
||||||
|
### Create LXC for Docker
|
||||||
|
```bash
|
||||||
|
pct create 2XX local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||||||
|
--hostname docker-7days-lxc \
|
||||||
|
--memory 4096 \
|
||||||
|
--cores 2 \
|
||||||
|
--net0 name=eth0,bridge=vmbr0,ip=10.10.0.TMP/24,gw=10.10.0.1 \
|
||||||
|
--storage local-lvm \
|
||||||
|
--rootfs local-lvm:32 \
|
||||||
|
--unprivileged 0 \
|
||||||
|
--features nesting=1,keyctl=1
|
||||||
|
|
||||||
|
pct start 2XX
|
||||||
|
pct enter 2XX
|
||||||
|
```
|
||||||
|
|
||||||
|
### Install Docker in LXC
|
||||||
|
```bash
|
||||||
|
apt update && apt upgrade -y
|
||||||
|
curl -fsSL https://get.docker.com -o get-docker.sh
|
||||||
|
sh get-docker.sh
|
||||||
|
apt install docker-compose-plugin -y
|
||||||
|
```
|
||||||
|
|
||||||
|
### Migrate Docker Volumes
|
||||||
|
```bash
|
||||||
|
# While VM running - initial sync
|
||||||
|
rsync -avz --progress root@VM_IP:/var/lib/docker/volumes/ root@LXC_IP:/var/lib/docker/volumes/
|
||||||
|
rsync -avz --progress root@VM_IP:/opt/docker/ root@LXC_IP:/opt/docker/
|
||||||
|
|
||||||
|
# During cutover - final sync with VM stopped
|
||||||
|
rsync -avz --progress --delete root@VM_IP:/var/lib/docker/volumes/ root@LXC_IP:/var/lib/docker/volumes/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cutover Process
|
||||||
|
1. Stop VM: `qm stop 111`
|
||||||
|
2. Reconfigure LXC to production IP
|
||||||
|
3. Start LXC: `pct start 211`
|
||||||
|
4. Validate services
|
||||||
|
5. Monitor for 48 hours
|
||||||
|
6. Keep VM stopped for rollback capability
|
||||||
|
|
||||||
|
### Rollback (if needed)
|
||||||
|
```bash
|
||||||
|
pct stop 211
|
||||||
|
qm start 111
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Immediate Steps
|
||||||
|
|
||||||
|
1. **Schedule Wave 1**: Pick maintenance window for docker-7days
|
||||||
|
2. **Build LXC 211**: Create first container
|
||||||
|
3. **Test & Migrate**: Execute Wave 1
|
||||||
|
4. **Document Learnings**: Refine process for Wave 2
|
||||||
|
|
||||||
|
## Full Documentation
|
||||||
|
See `/mnt/NV2/Development/claude-home/vm-management/lxc-migration-plan.md` for comprehensive details.
|
||||||
|
|
||||||
|
## Expected Benefits
|
||||||
|
- **~17GB RAM freed** (87% reduction in overhead)
|
||||||
|
- **5-10x faster backups/restores**
|
||||||
|
- **Near-instant container starts** (1-5 seconds)
|
||||||
|
- **Improved resource density**
|
||||||
242
vm-management/scripts/LXC-MIGRATION-GUIDE.md
Normal file
242
vm-management/scripts/LXC-MIGRATION-GUIDE.md
Normal file
@ -0,0 +1,242 @@
|
|||||||
|
# LXC Migration Automation Scripts
|
||||||
|
|
||||||
|
This guide covers automation scripts for migrating VM-based Docker containers to LXC containers.
|
||||||
|
|
||||||
|
## Scripts Overview
|
||||||
|
|
||||||
|
### 1. `lxc-docker-create.sh` - LXC Container Creation
|
||||||
|
|
||||||
|
Automates the creation of LXC containers with Docker pre-installed and configured for container workloads.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
./lxc-docker-create.sh <VMID> <HOSTNAME> <IP> <DISK_SIZE> <MEMORY> <CORES> [PROXMOX_HOST]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example (local Proxmox):**
|
||||||
|
```bash
|
||||||
|
./lxc-docker-create.sh 214 docker-pittsburgh-lxc 10.10.0.214 128G 16384 4
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example (remote Proxmox):**
|
||||||
|
```bash
|
||||||
|
./lxc-docker-create.sh 214 docker-pittsburgh-lxc 10.10.0.214 128G 16384 4 root@10.10.0.11
|
||||||
|
```
|
||||||
|
|
||||||
|
**What it does:**
|
||||||
|
- Creates LXC container with specified resources
|
||||||
|
- Configures AppArmor for Docker compatibility
|
||||||
|
- Enables nesting and keyctl features
|
||||||
|
- Installs Docker and docker-compose-plugin
|
||||||
|
- Sets container to start on boot
|
||||||
|
|
||||||
|
**Time:** ~10 minutes (includes Docker installation)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. `fix-docker-apparmor.sh` - Docker Compose AppArmor Fix
|
||||||
|
|
||||||
|
Adds AppArmor unconfined security options to all services in docker-compose.yml files. Required for Docker containers running inside LXC.
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
./fix-docker-apparmor.sh <LXC_IP> [COMPOSE_DIR]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
./fix-docker-apparmor.sh 10.10.0.214
|
||||||
|
./fix-docker-apparmor.sh 10.10.0.214 /home/cal/container-data
|
||||||
|
```
|
||||||
|
|
||||||
|
**What it does:**
|
||||||
|
- SSHs into the LXC container
|
||||||
|
- Finds all docker-compose.yml files
|
||||||
|
- Adds `security_opt: ["apparmor=unconfined"]` to each service
|
||||||
|
- Creates backups of original files
|
||||||
|
|
||||||
|
**Time:** ~1-2 minutes
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Complete Migration Workflow
|
||||||
|
|
||||||
|
### Wave 2 Example: VM 114 (Pittsburgh) → LXC 214
|
||||||
|
|
||||||
|
#### Step 1: Create LXC Container
|
||||||
|
```bash
|
||||||
|
cd /mnt/NV2/Development/claude-home/vm-management/scripts
|
||||||
|
|
||||||
|
./lxc-docker-create.sh 214 docker-pittsburgh-lxc 10.10.0.214 128G 16384 4 root@10.10.0.11
|
||||||
|
```
|
||||||
|
|
||||||
|
**Wait ~10 minutes for Docker installation**
|
||||||
|
|
||||||
|
#### Step 2: Copy SSH Key (if needed)
|
||||||
|
```bash
|
||||||
|
ssh root@10.10.0.11 "cat ~/.ssh/id_rsa.pub | pct exec 214 -- tee /root/.ssh/authorized_keys"
|
||||||
|
```
|
||||||
|
|
||||||
|
Or setup password-less SSH to LXC:
|
||||||
|
```bash
|
||||||
|
ssh root@10.10.0.11 "
|
||||||
|
ssh-keyscan -H 10.10.0.214 >> ~/.ssh/known_hosts 2>/dev/null
|
||||||
|
pct exec 214 -- mkdir -p /root/.ssh
|
||||||
|
cat ~/.ssh/id_rsa.pub | pct exec 214 -- tee -a /root/.ssh/authorized_keys
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 3: Migrate Data
|
||||||
|
|
||||||
|
**Option A: rsync from VM (recommended)**
|
||||||
|
```bash
|
||||||
|
# From Proxmox host - direct rsync from old VM to new LXC
|
||||||
|
ssh root@10.10.0.11 "
|
||||||
|
rsync -avz --info=progress2 \
|
||||||
|
/mnt/vm114/home/cal/container-data/ \
|
||||||
|
root@10.10.0.214:/home/cal/container-data/
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Option B: Mount and copy from backup**
|
||||||
|
```bash
|
||||||
|
# If VM is already shut down and you have backups/snapshots
|
||||||
|
ssh root@10.10.0.11 "
|
||||||
|
mount /dev/vm114-vg/vm114-data /mnt/vm114
|
||||||
|
rsync -avz /mnt/vm114/home/cal/container-data/ root@10.10.0.214:/home/cal/container-data/
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 4: Fix AppArmor in Docker Compose Files
|
||||||
|
```bash
|
||||||
|
./fix-docker-apparmor.sh 10.10.0.214
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 5: Start Containers
|
||||||
|
```bash
|
||||||
|
ssh root@10.10.0.214
|
||||||
|
|
||||||
|
# Navigate to service directory
|
||||||
|
cd /home/cal/container-data/[service-name]
|
||||||
|
|
||||||
|
# Start containers
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# Check status
|
||||||
|
docker compose ps
|
||||||
|
docker compose logs -f
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Step 6: Verify and Test
|
||||||
|
- Test service functionality
|
||||||
|
- Check container logs for errors
|
||||||
|
- Verify network connectivity
|
||||||
|
- Test external access (if applicable)
|
||||||
|
|
||||||
|
#### Step 7: Update Documentation
|
||||||
|
- Mark VM as migrated in wave plan
|
||||||
|
- Document any issues encountered
|
||||||
|
- Update service inventory with new IP
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Time Estimates
|
||||||
|
|
||||||
|
| Step | Duration | Notes |
|
||||||
|
|------|----------|-------|
|
||||||
|
| Create LXC | 10 min | Includes Docker installation |
|
||||||
|
| Setup SSH | 1 min | One-time setup |
|
||||||
|
| Data migration | Varies | Depends on data size (58GB ≈ 60 min @ 16MB/s) |
|
||||||
|
| Fix AppArmor | 2 min | Automated script |
|
||||||
|
| Start containers | 5 min | Per service stack |
|
||||||
|
| **Total (small data)** | **~20 min** | For <10GB data |
|
||||||
|
| **Total (large data)** | **~80 min** | For ~60GB data like Wave 1 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Waves Reference
|
||||||
|
|
||||||
|
Based on `vm-management/wave1-migration-results.md`:
|
||||||
|
|
||||||
|
### Remaining Migrations
|
||||||
|
|
||||||
|
| Wave | VM | Hostname | Services | Data Size | LXC ID | Priority |
|
||||||
|
|------|-----|----------|----------|-----------|--------|----------|
|
||||||
|
| 2 | 114 | Pittsburgh | Docker services | ~30GB | 214 | High |
|
||||||
|
| 3 | 112 | Louisville | Docker services | ~20GB | 212 | High |
|
||||||
|
| 4 | 115 | Denver | Docker services | ~25GB | 215 | Medium |
|
||||||
|
| 5 | 113 | Fresno | Docker services | ~15GB | 213 | Medium |
|
||||||
|
| 6 | Multiple | Misc services | Varies | Varies | TBD | Low |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Container Creation Fails
|
||||||
|
- **Check:** Template exists: `pct list | grep template`
|
||||||
|
- **Check:** VMID not in use: `pct list | grep <VMID>`
|
||||||
|
- **Check:** Sufficient storage: `pvesm status`
|
||||||
|
|
||||||
|
### SSH Connection Failed
|
||||||
|
- **Check:** Container is running: `pct status <VMID>`
|
||||||
|
- **Check:** SSH key copied: `pct exec <VMID> -- cat /root/.ssh/authorized_keys`
|
||||||
|
- **Check:** Network connectivity: `ping <IP>`
|
||||||
|
|
||||||
|
### Docker Containers Won't Start
|
||||||
|
- **Check:** AppArmor fix applied: `grep security_opt docker-compose.yml`
|
||||||
|
- **Check:** Docker service running: `systemctl status docker`
|
||||||
|
- **Check:** Container logs: `docker compose logs`
|
||||||
|
|
||||||
|
### Data Migration Slow
|
||||||
|
- **Use:** rsync with compression: `rsync -avz`
|
||||||
|
- **Use:** Direct VM-to-LXC transfer (avoid copying to intermediate location)
|
||||||
|
- **Monitor:** Network usage: `iftop` or `nload`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
1. **Always create backups** before destroying source VMs
|
||||||
|
2. **Test services** thoroughly before decommissioning VMs
|
||||||
|
3. **Document changes** in wave results files
|
||||||
|
4. **Monitor resources** during migration (CPU, RAM, disk I/O)
|
||||||
|
5. **Schedule migrations** during low-usage periods
|
||||||
|
6. **Keep scripts updated** with learnings from each wave
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Script Maintenance
|
||||||
|
|
||||||
|
### Adding New Features
|
||||||
|
|
||||||
|
Both scripts are designed to be easily extended:
|
||||||
|
|
||||||
|
- **lxc-docker-create.sh**: Add additional package installations, configuration steps, or validation checks
|
||||||
|
- **fix-docker-apparmor.sh**: Modify the Python script to add additional docker-compose fixes
|
||||||
|
|
||||||
|
### Testing Changes
|
||||||
|
|
||||||
|
Before using updated scripts in production:
|
||||||
|
1. Test on a non-critical VM migration
|
||||||
|
2. Verify all steps complete successfully
|
||||||
|
3. Check container functionality post-migration
|
||||||
|
4. Document any new issues or improvements
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
- [Wave 1 Migration Results](../wave1-migration-results.md) - Lessons learned from first migration
|
||||||
|
- [Migration Quick Start](../migration-quick-start.md) - Fast reference guide
|
||||||
|
- [LXC Migration Plan](../lxc-migration-plan.md) - Overall migration strategy
|
||||||
|
- [VM Management Context](../CONTEXT.md) - VM infrastructure overview
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues or questions:
|
||||||
|
1. Check troubleshooting section above
|
||||||
|
2. Review wave results for similar issues
|
||||||
|
3. Check Proxmox logs: `journalctl -u pve-container@<VMID>`
|
||||||
|
4. Review Docker logs: `docker compose logs`
|
||||||
275
vm-management/scripts/fix-docker-apparmor.sh
Executable file
275
vm-management/scripts/fix-docker-apparmor.sh
Executable file
@ -0,0 +1,275 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
#
|
||||||
|
# Docker Compose AppArmor Fix Script
|
||||||
|
#
|
||||||
|
# Adds 'security_opt: ["apparmor=unconfined"]' to all services in docker-compose.yml files
|
||||||
|
# This is required for Docker containers running inside LXC containers.
|
||||||
|
#
|
||||||
|
# Usage: ./fix-docker-apparmor.sh <LXC_IP> [COMPOSE_DIR]
|
||||||
|
#
|
||||||
|
# Example: ./fix-docker-apparmor.sh 10.10.0.214
|
||||||
|
# Example: ./fix-docker-apparmor.sh 10.10.0.214 /home/cal/container-data
|
||||||
|
#
|
||||||
|
# Arguments:
|
||||||
|
# LXC_IP - IP address of the LXC container to SSH into
|
||||||
|
# COMPOSE_DIR - Optional directory containing docker-compose files (default: /home/cal/container-data)
|
||||||
|
#
|
||||||
|
# What this script does:
|
||||||
|
# 1. SSHs into the LXC container
|
||||||
|
# 2. Finds all docker-compose.yml files
|
||||||
|
# 3. Adds security_opt configuration to each service
|
||||||
|
# 4. Creates backups of original files
|
||||||
|
#
|
||||||
|
# Why this is needed:
|
||||||
|
# Docker containers in LXC need AppArmor disabled to function properly.
|
||||||
|
# Without this fix, containers may fail to start or have permission issues.
|
||||||
|
#
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Color codes for output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
BLUE='\033[0;34m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Function to print colored messages
|
||||||
|
log_info() {
|
||||||
|
echo -e "${GREEN}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_warn() {
|
||||||
|
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_debug() {
|
||||||
|
echo -e "${BLUE}[DEBUG]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Parse arguments
|
||||||
|
if [[ $# -lt 1 ]]; then
|
||||||
|
log_error "Insufficient arguments"
|
||||||
|
echo "Usage: $0 <LXC_IP> [COMPOSE_DIR]"
|
||||||
|
echo ""
|
||||||
|
echo "Example: $0 10.10.0.214"
|
||||||
|
echo "Example: $0 10.10.0.214 /home/cal/container-data"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
LXC_IP=$1
|
||||||
|
COMPOSE_DIR=${2:-/home/cal/container-data}
|
||||||
|
|
||||||
|
log_info "Starting AppArmor fix for Docker Compose files"
|
||||||
|
log_info "Target: root@$LXC_IP"
|
||||||
|
log_info "Directory: $COMPOSE_DIR"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check SSH connectivity
|
||||||
|
log_info "Testing SSH connection to $LXC_IP..."
|
||||||
|
if ! ssh -o ConnectTimeout=5 -o BatchMode=yes root@"$LXC_IP" "echo 'SSH OK'" &>/dev/null; then
|
||||||
|
log_error "Cannot connect to root@$LXC_IP via SSH"
|
||||||
|
log_error "Please ensure:"
|
||||||
|
echo " 1. SSH key is copied to the LXC container"
|
||||||
|
echo " 2. Container is running"
|
||||||
|
echo " 3. IP address is correct"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
log_info "✅ SSH connection successful"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Create Python script on remote host
|
||||||
|
log_info "Creating AppArmor fix script on remote host..."
|
||||||
|
ssh root@"$LXC_IP" "cat > /tmp/fix_apparmor.py" <<'PYTHON_SCRIPT'
|
||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Fix Docker Compose files to work in LXC by adding AppArmor unconfined security option.
|
||||||
|
"""
|
||||||
|
import yaml
|
||||||
|
import glob
|
||||||
|
import sys
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
def add_apparmor_fix(compose_file):
|
||||||
|
"""Add security_opt to all services in a docker-compose file."""
|
||||||
|
print(f"\n📄 Processing: {compose_file}")
|
||||||
|
|
||||||
|
# Create backup
|
||||||
|
backup_file = f"{compose_file}.backup"
|
||||||
|
if not os.path.exists(backup_file):
|
||||||
|
os.system(f"cp '{compose_file}' '{backup_file}'")
|
||||||
|
print(f" ✅ Backup created: {backup_file}")
|
||||||
|
else:
|
||||||
|
print(f" ⏭️ Backup already exists: {backup_file}")
|
||||||
|
|
||||||
|
# Load compose file
|
||||||
|
try:
|
||||||
|
with open(compose_file, 'r') as f:
|
||||||
|
compose_data = yaml.safe_load(f)
|
||||||
|
except yaml.YAMLError as e:
|
||||||
|
print(f" ❌ Error parsing YAML: {e}")
|
||||||
|
return False
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ❌ Error reading file: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
if not compose_data or 'services' not in compose_data:
|
||||||
|
print(f" ⚠️ No services found in compose file")
|
||||||
|
return False
|
||||||
|
|
||||||
|
# Track changes
|
||||||
|
services_modified = 0
|
||||||
|
services_skipped = 0
|
||||||
|
|
||||||
|
# Add security_opt to each service
|
||||||
|
for service_name, service_config in compose_data['services'].items():
|
||||||
|
if service_config is None:
|
||||||
|
service_config = {}
|
||||||
|
compose_data['services'][service_name] = service_config
|
||||||
|
|
||||||
|
# Check if security_opt already exists
|
||||||
|
existing_security = service_config.get('security_opt', [])
|
||||||
|
|
||||||
|
if 'apparmor=unconfined' in existing_security or 'apparmor:unconfined' in existing_security:
|
||||||
|
print(f" ⏭️ {service_name}: Already has AppArmor unconfined")
|
||||||
|
services_skipped += 1
|
||||||
|
else:
|
||||||
|
# Add apparmor=unconfined
|
||||||
|
if not existing_security:
|
||||||
|
service_config['security_opt'] = ['apparmor=unconfined']
|
||||||
|
else:
|
||||||
|
if 'apparmor=unconfined' not in existing_security:
|
||||||
|
existing_security.append('apparmor=unconfined')
|
||||||
|
service_config['security_opt'] = existing_security
|
||||||
|
|
||||||
|
print(f" ✅ {service_name}: Added AppArmor unconfined")
|
||||||
|
services_modified += 1
|
||||||
|
|
||||||
|
# Write updated compose file
|
||||||
|
try:
|
||||||
|
with open(compose_file, 'w') as f:
|
||||||
|
yaml.dump(compose_data, f, default_flow_style=False, sort_keys=False, indent=2)
|
||||||
|
|
||||||
|
if services_modified > 0:
|
||||||
|
print(f" 💾 Saved changes ({services_modified} services modified)")
|
||||||
|
|
||||||
|
return services_modified > 0
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ❌ Error writing file: {e}")
|
||||||
|
# Restore backup
|
||||||
|
os.system(f"cp '{backup_file}' '{compose_file}'")
|
||||||
|
print(f" 🔄 Restored from backup")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main function to process all docker-compose files."""
|
||||||
|
compose_dir = sys.argv[1] if len(sys.argv) > 1 else "/home/cal/container-data"
|
||||||
|
|
||||||
|
print(f"🔍 Searching for docker-compose.yml files in {compose_dir}")
|
||||||
|
|
||||||
|
# Find all docker-compose files
|
||||||
|
patterns = [
|
||||||
|
f"{compose_dir}/**/docker-compose.yml",
|
||||||
|
f"{compose_dir}/**/docker-compose.yaml",
|
||||||
|
]
|
||||||
|
|
||||||
|
compose_files = []
|
||||||
|
for pattern in patterns:
|
||||||
|
compose_files.extend(glob.glob(pattern, recursive=True))
|
||||||
|
|
||||||
|
# Remove duplicates and sort
|
||||||
|
compose_files = sorted(set(compose_files))
|
||||||
|
|
||||||
|
if not compose_files:
|
||||||
|
print(f"⚠️ No docker-compose files found in {compose_dir}")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
print(f"📋 Found {len(compose_files)} docker-compose file(s)")
|
||||||
|
|
||||||
|
# Process each file
|
||||||
|
total_modified = 0
|
||||||
|
total_errors = 0
|
||||||
|
|
||||||
|
for compose_file in compose_files:
|
||||||
|
try:
|
||||||
|
if add_apparmor_fix(compose_file):
|
||||||
|
total_modified += 1
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ❌ Unexpected error: {e}")
|
||||||
|
total_errors += 1
|
||||||
|
|
||||||
|
# Summary
|
||||||
|
print("\n" + "="*60)
|
||||||
|
print("📊 SUMMARY")
|
||||||
|
print("="*60)
|
||||||
|
print(f"Total files found: {len(compose_files)}")
|
||||||
|
print(f"Files modified: {total_modified}")
|
||||||
|
print(f"Files with errors: {total_errors}")
|
||||||
|
print(f"Files unchanged: {len(compose_files) - total_modified - total_errors}")
|
||||||
|
print("="*60)
|
||||||
|
|
||||||
|
if total_modified > 0:
|
||||||
|
print("\n✅ AppArmor fix applied successfully!")
|
||||||
|
print("\n💡 Next steps:")
|
||||||
|
print(" 1. Review changes in modified files")
|
||||||
|
print(" 2. Start containers: docker compose up -d")
|
||||||
|
print(" 3. Check container status: docker compose ps")
|
||||||
|
print("\n📝 Note: Backups created with .backup extension")
|
||||||
|
|
||||||
|
return 0 if total_errors == 0 else 1
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
sys.exit(main())
|
||||||
|
PYTHON_SCRIPT
|
||||||
|
|
||||||
|
log_info "✅ Script uploaded to LXC container"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Install PyYAML if needed
|
||||||
|
log_info "Ensuring Python and PyYAML are installed..."
|
||||||
|
ssh root@"$LXC_IP" "apt-get update -qq && apt-get install -y -qq python3 python3-yaml > /dev/null 2>&1" || true
|
||||||
|
log_info "✅ Dependencies ready"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Run the fix script
|
||||||
|
log_info "Running AppArmor fix script..."
|
||||||
|
echo ""
|
||||||
|
ssh root@"$LXC_IP" "python3 /tmp/fix_apparmor.py '$COMPOSE_DIR'"
|
||||||
|
EXIT_CODE=$?
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Cleanup
|
||||||
|
log_info "Cleaning up temporary files..."
|
||||||
|
ssh root@"$LXC_IP" "rm /tmp/fix_apparmor.py"
|
||||||
|
log_info "✅ Cleanup complete"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
if [[ $EXIT_CODE -eq 0 ]]; then
|
||||||
|
log_info "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
log_info "🎉 AppArmor Fix Complete!"
|
||||||
|
log_info "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo ""
|
||||||
|
echo "Your docker-compose files have been updated to work in LXC."
|
||||||
|
echo ""
|
||||||
|
echo "Next steps:"
|
||||||
|
echo " 1. SSH into container:"
|
||||||
|
echo " ssh root@$LXC_IP"
|
||||||
|
echo ""
|
||||||
|
echo " 2. Navigate to a service directory:"
|
||||||
|
echo " cd $COMPOSE_DIR/[service-name]"
|
||||||
|
echo ""
|
||||||
|
echo " 3. Start containers:"
|
||||||
|
echo " docker compose up -d"
|
||||||
|
echo ""
|
||||||
|
echo " 4. Check status:"
|
||||||
|
echo " docker compose ps"
|
||||||
|
echo ""
|
||||||
|
else
|
||||||
|
log_error "AppArmor fix encountered errors. Please review output above."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
214
vm-management/scripts/lxc-docker-create.sh
Executable file
214
vm-management/scripts/lxc-docker-create.sh
Executable file
@ -0,0 +1,214 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
#
|
||||||
|
# LXC Docker Container Creation Script
|
||||||
|
#
|
||||||
|
# Creates a new LXC container with Docker pre-installed and configured
|
||||||
|
# for running containerized services.
|
||||||
|
#
|
||||||
|
# Usage: ./lxc-docker-create.sh <VMID> <HOSTNAME> <IP> <DISK_SIZE> <MEMORY> <CORES> [PROXMOX_HOST]
|
||||||
|
#
|
||||||
|
# Example: ./lxc-docker-create.sh 214 docker-pittsburgh-lxc 10.10.0.214 128G 16384 4
|
||||||
|
# Example with remote host: ./lxc-docker-create.sh 214 docker-pittsburgh-lxc 10.10.0.214 128G 16384 4 root@10.10.0.11
|
||||||
|
#
|
||||||
|
# Arguments:
|
||||||
|
# VMID - Proxmox container ID (e.g., 214)
|
||||||
|
# HOSTNAME - Container hostname (e.g., docker-pittsburgh-lxc)
|
||||||
|
# IP - Static IP address without CIDR (e.g., 10.10.0.214)
|
||||||
|
# DISK_SIZE - Root filesystem size (e.g., 128G)
|
||||||
|
# MEMORY - RAM in MB (e.g., 16384)
|
||||||
|
# CORES - CPU cores (e.g., 4)
|
||||||
|
# PROXMOX_HOST - Optional SSH host for remote Proxmox (e.g., root@10.10.0.11)
|
||||||
|
#
|
||||||
|
# What this script does:
|
||||||
|
# 1. Creates LXC container with specified resources
|
||||||
|
# 2. Configures AppArmor for Docker compatibility
|
||||||
|
# 3. Enables nesting and keyctl features
|
||||||
|
# 4. Installs Docker and docker-compose-plugin
|
||||||
|
# 5. Sets up container to start on boot
|
||||||
|
#
|
||||||
|
# Prerequisites:
|
||||||
|
# - Ubuntu 20.04 template downloaded on Proxmox host
|
||||||
|
# - Sufficient storage on local-lvm
|
||||||
|
# - Network bridge vmbr0 configured
|
||||||
|
# - Gateway at 10.10.0.1
|
||||||
|
#
|
||||||
|
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
# Color codes for output
|
||||||
|
RED='\033[0;31m'
|
||||||
|
GREEN='\033[0;32m'
|
||||||
|
YELLOW='\033[1;33m'
|
||||||
|
NC='\033[0m' # No Color
|
||||||
|
|
||||||
|
# Function to print colored messages
|
||||||
|
log_info() {
|
||||||
|
echo -e "${GREEN}[INFO]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_warn() {
|
||||||
|
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
log_error() {
|
||||||
|
echo -e "${RED}[ERROR]${NC} $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
# Function to execute commands on Proxmox host
|
||||||
|
execute_on_proxmox() {
|
||||||
|
if [[ -n "${PROXMOX_HOST:-}" ]]; then
|
||||||
|
ssh "$PROXMOX_HOST" "$@"
|
||||||
|
else
|
||||||
|
bash -c "$@"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Parse arguments
|
||||||
|
if [[ $# -lt 6 ]]; then
|
||||||
|
log_error "Insufficient arguments"
|
||||||
|
echo "Usage: $0 <VMID> <HOSTNAME> <IP> <DISK_SIZE> <MEMORY> <CORES> [PROXMOX_HOST]"
|
||||||
|
echo ""
|
||||||
|
echo "Example: $0 214 docker-pittsburgh-lxc 10.10.0.214 128G 16384 4"
|
||||||
|
echo "Example: $0 214 docker-pittsburgh-lxc 10.10.0.214 128G 16384 4 root@10.10.0.11"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
VMID=$1
|
||||||
|
HOSTNAME=$2
|
||||||
|
IP=$3
|
||||||
|
DISK_SIZE=$4
|
||||||
|
MEMORY=$5
|
||||||
|
CORES=$6
|
||||||
|
PROXMOX_HOST=${7:-}
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
TEMPLATE="local:vztmpl/ubuntu-20.04-standard_20.04-1_amd64.tar.gz"
|
||||||
|
GATEWAY="10.10.0.1"
|
||||||
|
NAMESERVER="8.8.8.8"
|
||||||
|
CIDR="24"
|
||||||
|
|
||||||
|
log_info "Starting LXC container creation"
|
||||||
|
log_info "Configuration:"
|
||||||
|
echo " VMID: $VMID"
|
||||||
|
echo " Hostname: $HOSTNAME"
|
||||||
|
echo " IP: $IP/$CIDR"
|
||||||
|
echo " Disk: $DISK_SIZE"
|
||||||
|
echo " Memory: $MEMORY MB"
|
||||||
|
echo " Cores: $CORES"
|
||||||
|
[[ -n "${PROXMOX_HOST:-}" ]] && echo " Proxmox: $PROXMOX_HOST" || echo " Proxmox: local"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check if container already exists
|
||||||
|
log_info "Checking if container $VMID already exists..."
|
||||||
|
if execute_on_proxmox "pct status $VMID 2>/dev/null"; then
|
||||||
|
log_error "Container $VMID already exists!"
|
||||||
|
read -p "Do you want to destroy and recreate it? (yes/no): " -r
|
||||||
|
if [[ $REPLY == "yes" ]]; then
|
||||||
|
log_warn "Stopping and destroying container $VMID..."
|
||||||
|
execute_on_proxmox "pct stop $VMID 2>/dev/null || true"
|
||||||
|
execute_on_proxmox "pct destroy $VMID"
|
||||||
|
log_info "Container $VMID destroyed"
|
||||||
|
else
|
||||||
|
log_error "Aborted by user"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create the LXC container
|
||||||
|
log_info "Creating LXC container $VMID..."
|
||||||
|
execute_on_proxmox "pct create $VMID $TEMPLATE \
|
||||||
|
--hostname $HOSTNAME \
|
||||||
|
--memory $MEMORY \
|
||||||
|
--cores $CORES \
|
||||||
|
--rootfs local-lvm:$DISK_SIZE \
|
||||||
|
--net0 name=eth0,bridge=vmbr0,ip=$IP/$CIDR,gw=$GATEWAY \
|
||||||
|
--unprivileged 0 \
|
||||||
|
--onboot 1 \
|
||||||
|
--nameserver $NAMESERVER"
|
||||||
|
|
||||||
|
log_info "✅ Container created"
|
||||||
|
|
||||||
|
# Configure AppArmor and features
|
||||||
|
log_info "Configuring AppArmor profile and container features..."
|
||||||
|
execute_on_proxmox "cat >> /etc/pve/lxc/$VMID.conf << 'EOF'
|
||||||
|
lxc.apparmor.profile: unconfined
|
||||||
|
lxc.cgroup2.devices.allow: a
|
||||||
|
lxc.cap.drop:
|
||||||
|
EOF"
|
||||||
|
|
||||||
|
# Update features line
|
||||||
|
execute_on_proxmox "sed -i 's/^features:.*/features: nesting=1,keyctl=1/' /etc/pve/lxc/$VMID.conf"
|
||||||
|
|
||||||
|
log_info "✅ AppArmor and features configured"
|
||||||
|
|
||||||
|
# Start the container
|
||||||
|
log_info "Starting container $VMID..."
|
||||||
|
execute_on_proxmox "pct start $VMID"
|
||||||
|
|
||||||
|
log_info "Waiting 10 seconds for container to boot..."
|
||||||
|
sleep 10
|
||||||
|
|
||||||
|
# Install Docker
|
||||||
|
log_info "Installing Docker and dependencies..."
|
||||||
|
execute_on_proxmox "pct exec $VMID -- bash <<'DOCKER_INSTALL'
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Update package list
|
||||||
|
apt-get update
|
||||||
|
|
||||||
|
# Install prerequisites
|
||||||
|
apt-get install -y \
|
||||||
|
ca-certificates \
|
||||||
|
curl \
|
||||||
|
gnupg \
|
||||||
|
lsb-release
|
||||||
|
|
||||||
|
# Download and run Docker installation script
|
||||||
|
curl -fsSL https://get.docker.com -o /tmp/get-docker.sh
|
||||||
|
sh /tmp/get-docker.sh
|
||||||
|
|
||||||
|
# Install docker-compose-plugin
|
||||||
|
apt-get install -y docker-compose-plugin
|
||||||
|
|
||||||
|
# Enable Docker service
|
||||||
|
systemctl enable docker
|
||||||
|
systemctl start docker
|
||||||
|
|
||||||
|
# Verify installation
|
||||||
|
docker --version
|
||||||
|
docker compose version
|
||||||
|
|
||||||
|
echo '✅ Docker installation complete'
|
||||||
|
DOCKER_INSTALL"
|
||||||
|
|
||||||
|
log_info "✅ Docker installed successfully"
|
||||||
|
|
||||||
|
# Display completion message
|
||||||
|
echo ""
|
||||||
|
log_info "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
log_info "🎉 LXC Container $VMID Ready!"
|
||||||
|
log_info "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
|
echo ""
|
||||||
|
echo "Container Details:"
|
||||||
|
echo " ID: $VMID"
|
||||||
|
echo " Hostname: $HOSTNAME"
|
||||||
|
echo " IP: $IP"
|
||||||
|
echo " Status: Running"
|
||||||
|
echo ""
|
||||||
|
echo "Next Steps:"
|
||||||
|
echo " 1. Copy SSH key (if needed):"
|
||||||
|
if [[ -n "${PROXMOX_HOST:-}" ]]; then
|
||||||
|
echo " ssh $PROXMOX_HOST \"cat ~/.ssh/id_rsa.pub | pct exec $VMID -- tee /root/.ssh/authorized_keys\""
|
||||||
|
else
|
||||||
|
echo " cat ~/.ssh/id_rsa.pub | pct exec $VMID -- tee /root/.ssh/authorized_keys"
|
||||||
|
fi
|
||||||
|
echo ""
|
||||||
|
echo " 2. Migrate data from source VM"
|
||||||
|
echo ""
|
||||||
|
echo " 3. Fix AppArmor in docker-compose files:"
|
||||||
|
echo " ./fix-docker-apparmor.sh $IP"
|
||||||
|
echo ""
|
||||||
|
echo " 4. Start containers:"
|
||||||
|
echo " ssh root@$IP 'cd /home/cal/container-data/[service] && docker compose up -d'"
|
||||||
|
echo ""
|
||||||
|
log_info "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||||
369
vm-management/wave1-migration-results.md
Normal file
369
vm-management/wave1-migration-results.md
Normal file
@ -0,0 +1,369 @@
|
|||||||
|
# Wave 1 Migration Results - docker-7days (VM 111 → LXC 211)
|
||||||
|
|
||||||
|
**Date**: 2025-01-12
|
||||||
|
**Status**: ✅ **SUCCESSFUL**
|
||||||
|
**Migration Time**: ~4 hours (including troubleshooting)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Successfully migrated docker-7days game server from VM 111 to LXC 211. Container is running with all data intact. AppArmor configuration issue was resolved, and the migration process has been validated for future waves.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Details
|
||||||
|
|
||||||
|
### Source (VM 111)
|
||||||
|
- **OS**: Ubuntu (in VM)
|
||||||
|
- **Resources**: 32GB RAM, 4 cores, 256GB disk
|
||||||
|
- **Uptime before migration**: 307.4 hours
|
||||||
|
- **Services**: 3 docker-compose projects (7 Days to Die game servers)
|
||||||
|
- **Data size**: 62GB
|
||||||
|
|
||||||
|
### Destination (LXC 211)
|
||||||
|
- **OS**: Ubuntu 20.04 LTS (in privileged LXC)
|
||||||
|
- **Resources**: 32GB RAM, 4 cores, 128GB disk (expanded from initial 64GB)
|
||||||
|
- **IP**: 10.10.0.250 (temporary)
|
||||||
|
- **Services**: 1 game server running (7dtd-solo-game)
|
||||||
|
- **Container ID**: d87df36c2dcd
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Timeline
|
||||||
|
|
||||||
|
| Time | Action | Status |
|
||||||
|
|------|--------|--------|
|
||||||
|
| Start | Gathered VM configuration | ✅ Complete |
|
||||||
|
| +15min | Created LXC 211 with Docker | ✅ Complete |
|
||||||
|
| +30min | Stopped VM 111 | ✅ Complete |
|
||||||
|
| +45min | Mounted VM disk and started rsync (62GB) | ✅ Complete |
|
||||||
|
| +2h 30min | Rsync completed | ✅ Complete |
|
||||||
|
| +2h 35min | **Disk full** - expanded from 64GB to 128GB | ✅ Resolved |
|
||||||
|
| +3h 00min | AppArmor blocking Docker containers | ⚠️ Issue |
|
||||||
|
| +3h 45min | Fixed AppArmor in docker-compose files | ✅ Resolved |
|
||||||
|
| +4h 00min | Container started successfully | ✅ Complete |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Issues Encountered & Solutions
|
||||||
|
|
||||||
|
### Issue 1: Disk Space Insufficient
|
||||||
|
**Problem**: 64GB disk filled to 100% with only 62GB of data
|
||||||
|
**Cause**: Thin provisioning still requires space for the data being written
|
||||||
|
**Solution**: Expanded LXC disk from 64GB to 128GB
|
||||||
|
**Command**:
|
||||||
|
```bash
|
||||||
|
pct resize 211 rootfs +64G
|
||||||
|
```
|
||||||
|
**Learning**: Allocate 2x data size for LXC root filesystem to account for overhead
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Issue 2: AppArmor Prevents Docker Container Start
|
||||||
|
**Problem**: Containers fail to start with error:
|
||||||
|
```
|
||||||
|
AppArmor enabled on system but the docker-default profile could not be loaded:
|
||||||
|
Permission denied; attempted to load a profile while confined?
|
||||||
|
error: exit status 243
|
||||||
|
```
|
||||||
|
|
||||||
|
**Root Cause**: LXC containers run "confined" by AppArmor, preventing Docker from loading its own AppArmor profiles
|
||||||
|
|
||||||
|
**Solutions Attempted**:
|
||||||
|
1. ❌ Disabled AppArmor at LXC level (`lxc.apparmor.profile: unconfined`) - Didn't help
|
||||||
|
2. ❌ Tried to configure Docker daemon.json with security options - Invalid config option
|
||||||
|
3. ✅ **Added security_opt to docker-compose.yml files** - WORKED!
|
||||||
|
|
||||||
|
**Working Solution**:
|
||||||
|
```yaml
|
||||||
|
# Add to each service in docker-compose.yml
|
||||||
|
services:
|
||||||
|
service-name:
|
||||||
|
image: ...
|
||||||
|
security_opt:
|
||||||
|
- apparmor=unconfined
|
||||||
|
# ... rest of config
|
||||||
|
```
|
||||||
|
|
||||||
|
**Implementation**:
|
||||||
|
```bash
|
||||||
|
# Used Python to properly modify YAML files
|
||||||
|
python3 <<'PYTHON'
|
||||||
|
import yaml
|
||||||
|
import glob
|
||||||
|
|
||||||
|
for compose_path in glob.glob("/home/cal/container-data/ul-*/docker-compose.yml"):
|
||||||
|
with open(compose_path, 'r') as f:
|
||||||
|
compose = yaml.safe_load(f)
|
||||||
|
|
||||||
|
for service_name, service_config in compose.get('services', {}).items():
|
||||||
|
service_config['security_opt'] = ['apparmor=unconfined']
|
||||||
|
|
||||||
|
with open(compose_path, 'w') as f:
|
||||||
|
yaml.dump(compose, f, default_flow_style=False, sort_keys=False)
|
||||||
|
PYTHON
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why This Works**: Tells Docker to run containers without AppArmor confinement, bypassing the LXC AppArmor conflict
|
||||||
|
|
||||||
|
**Learning**: **ALL future Docker-in-LXC migrations require this modification**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resource Usage Comparison
|
||||||
|
|
||||||
|
### Before Migration (VM)
|
||||||
|
- **Memory**: 345MB used / 32GB allocated (1% utilization, 99% wasted)
|
||||||
|
- **Disk**: Unknown actual usage / 256GB allocated
|
||||||
|
- **CPU**: 0% (idle)
|
||||||
|
- **Boot time**: ~30-90 seconds
|
||||||
|
|
||||||
|
### After Migration (LXC)
|
||||||
|
- **Memory**: 248MB used / 32GB allocated (similar usage, but faster access)
|
||||||
|
- **Disk**: 60GB used / 128GB allocated (47% utilization)
|
||||||
|
- **CPU**: 0% (idle, same as before)
|
||||||
|
- **Boot time**: ~5 seconds
|
||||||
|
|
||||||
|
### Efficiency Gains
|
||||||
|
- **Memory overhead**: Reduced from ~700MB (VM OS) to ~100MB (LXC overhead) = **600MB saved**
|
||||||
|
- **Disk usage**: More transparent (thin provisioning visible)
|
||||||
|
- **Boot time**: **6-18x faster** (5s vs 30-90s)
|
||||||
|
- **Backup time**: Expected **5-10x faster** (LXC incremental backups)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Final Configuration
|
||||||
|
|
||||||
|
### LXC 211 Config (`/etc/pve/lxc/211.conf`)
|
||||||
|
```
|
||||||
|
arch: amd64
|
||||||
|
cores: 4
|
||||||
|
hostname: docker-7days-lxc
|
||||||
|
memory: 32768
|
||||||
|
nameserver: 8.8.8.8
|
||||||
|
net0: name=eth0,bridge=vmbr0,gw=10.10.0.1,hwaddr=CE:7E:8F:B2:40:C2,ip=10.10.0.250/24,type=veth
|
||||||
|
onboot: 1
|
||||||
|
ostype: ubuntu
|
||||||
|
rootfs: local-lvm:vm-211-disk-0,size=128G
|
||||||
|
searchdomain: local
|
||||||
|
swap: 2048
|
||||||
|
features: nesting=1,keyctl=1
|
||||||
|
lxc.apparmor.profile: unconfined
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running Container
|
||||||
|
```bash
|
||||||
|
CONTAINER ID IMAGE STATUS PORTS
|
||||||
|
d87df36c2dcd vinanrra/7dtd-server Up 12 seconds 0.0.0.0:26900->26900/tcp,
|
||||||
|
0.0.0.0:26900-26902->26900-26902/udp
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker-Compose Projects
|
||||||
|
1. **ul-solo-game** - ✅ Running on port 26900
|
||||||
|
2. **ul-test** - ⏸️ Stopped (port conflict with ul-solo-game)
|
||||||
|
3. **ul-public** - ⏸️ Stopped (port conflict with ul-solo-game)
|
||||||
|
|
||||||
|
**Note**: All three projects work, but only one can run at a time due to shared port 26900 (expected behavior)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Validation Results
|
||||||
|
|
||||||
|
✅ **Container Status**: Running and healthy
|
||||||
|
✅ **Data Integrity**: All 62GB of game server data accessible
|
||||||
|
✅ **Network**: Listening on expected ports (26900-26902)
|
||||||
|
✅ **Docker**: Working correctly with AppArmor fix
|
||||||
|
✅ **Performance**: Container started successfully, no errors in logs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Learnings for Future Waves
|
||||||
|
|
||||||
|
### 1. Disk Sizing
|
||||||
|
- **Rule**: Allocate **2x the data size** for LXC root filesystem
|
||||||
|
- **Why**: Accounts for overhead, temporary files, and headroom
|
||||||
|
- **Example**: 62GB data → 128GB allocation (not 64GB)
|
||||||
|
|
||||||
|
### 2. AppArmor Configuration
|
||||||
|
- **Critical**: ALL docker-compose files need `security_opt: [apparmor=unconfined]`
|
||||||
|
- **When**: Add this BEFORE starting containers (not after)
|
||||||
|
- **How**: Use Python/YAML library for proper syntax (sed breaks YAML)
|
||||||
|
- **Template**:
|
||||||
|
```python
|
||||||
|
import yaml
|
||||||
|
for compose_path in glob.glob("*/docker-compose.yml"):
|
||||||
|
with open(compose_path, 'r') as f:
|
||||||
|
compose = yaml.safe_load(f)
|
||||||
|
for service_name, service_config in compose.get('services', {}).items():
|
||||||
|
service_config['security_opt'] = ['apparmor=unconfined']
|
||||||
|
with open(compose_path, 'w') as f:
|
||||||
|
yaml.dump(compose, f, default_flow_style=False, sort_keys=False)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. LXC Configuration Requirements
|
||||||
|
- **Privileged mode**: Required (`--unprivileged 0`)
|
||||||
|
- **Features**: `nesting=1,keyctl=1` for Docker
|
||||||
|
- **AppArmor**: `lxc.apparmor.profile: unconfined` in config
|
||||||
|
|
||||||
|
### 4. Data Migration Strategy
|
||||||
|
- **Method**: rsync over network worked well (16MB/s average)
|
||||||
|
- **Time**: ~1 hour for 62GB (acceptable)
|
||||||
|
- **Alternative**: Direct disk mount + copy would be faster but more complex
|
||||||
|
|
||||||
|
### 5. Ubuntu Version
|
||||||
|
- **Used**: Ubuntu 20.04 LTS (Proxmox didn't support 22.04 template)
|
||||||
|
- **Works**: Perfectly fine, Docker 28.1.1 installed successfully
|
||||||
|
- **Note**: Not a blocker for migration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollback Capability
|
||||||
|
|
||||||
|
✅ **VM 111 preserved**: Stopped but intact, can restart if needed
|
||||||
|
✅ **VM disk mounted**: Available at `/mnt/vm111` on Proxmox host
|
||||||
|
✅ **Rollback time**: <5 minutes (just start VM 111)
|
||||||
|
✅ **Data loss risk**: None (original data untouched)
|
||||||
|
|
||||||
|
**Rollback command if needed**:
|
||||||
|
```bash
|
||||||
|
pct stop 211
|
||||||
|
qm start 111
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommended Monitoring Period
|
||||||
|
|
||||||
|
- **24-48 hours**: Keep VM 111 stopped but available
|
||||||
|
- **After 48 hours**: If LXC stable, can delete VM 111
|
||||||
|
- **Backup before delete**: Create LXC backup first
|
||||||
|
|
||||||
|
**Monitoring checklist**:
|
||||||
|
- [ ] Game server connectable and playable
|
||||||
|
- [ ] No crashes or restarts
|
||||||
|
- [ ] Memory usage stable
|
||||||
|
- [ ] No disk space issues
|
||||||
|
- [ ] Backup/restore tested
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
### Immediate (Optional)
|
||||||
|
- [ ] Test game server connectivity from client
|
||||||
|
- [ ] Switch LXC 211 from temp IP (10.10.0.250) to production IP if needed
|
||||||
|
- [ ] Update DNS/firewall rules if required
|
||||||
|
|
||||||
|
### Short Term (24-48 hours)
|
||||||
|
- [ ] Monitor LXC stability
|
||||||
|
- [ ] Validate container doesn't crash
|
||||||
|
- [ ] Check resource usage patterns
|
||||||
|
|
||||||
|
### Before Wave 2
|
||||||
|
- [ ] Create LXC backup
|
||||||
|
- [ ] Verify backup restore procedure
|
||||||
|
- [ ] Delete VM 111 (or archive)
|
||||||
|
- [ ] Update migration scripts with AppArmor fix
|
||||||
|
- [ ] Update Wave 2 plan with learnings
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Updated Migration Checklist for Waves 2-6
|
||||||
|
|
||||||
|
Based on Wave 1 learnings, future migrations should follow this checklist:
|
||||||
|
|
||||||
|
### Pre-Migration
|
||||||
|
- [ ] Document VM configuration (IP, resources, services)
|
||||||
|
- [ ] Calculate disk space: **data_size × 2** for LXC allocation
|
||||||
|
- [ ] Create LXC with privileged mode + nesting + keyctl
|
||||||
|
- [ ] Add `lxc.apparmor.profile: unconfined` to LXC config
|
||||||
|
- [ ] Install Docker in LXC
|
||||||
|
|
||||||
|
### Migration
|
||||||
|
- [ ] Stop VM
|
||||||
|
- [ ] Mount VM disk OR rsync data
|
||||||
|
- [ ] **Apply AppArmor fix to all docker-compose.yml files**
|
||||||
|
- [ ] Start containers
|
||||||
|
- [ ] Validate services
|
||||||
|
|
||||||
|
### Post-Migration
|
||||||
|
- [ ] Monitor for 24-48 hours
|
||||||
|
- [ ] Create LXC backup
|
||||||
|
- [ ] Delete/archive VM after validation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Efficiency Metrics
|
||||||
|
|
||||||
|
| Metric | Value | Notes |
|
||||||
|
|--------|-------|-------|
|
||||||
|
| **Planning time** | 30 minutes | Documentation review |
|
||||||
|
| **Execution time** | 4 hours | Including troubleshooting |
|
||||||
|
| **Troubleshooting time** | 1.5 hours | AppArmor + disk space |
|
||||||
|
| **Data migration time** | 1 hour | 62GB rsync |
|
||||||
|
| **Downtime** | 4 hours | Game server unavailable |
|
||||||
|
| **Success rate** | 100% | All services working |
|
||||||
|
|
||||||
|
### Expected Improvement for Wave 2+
|
||||||
|
With AppArmor fix pre-applied and proper disk sizing:
|
||||||
|
- **Execution time**: ~2 hours (50% reduction)
|
||||||
|
- **Troubleshooting time**: <30 minutes
|
||||||
|
- **Downtime**: ~2 hours
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Modified
|
||||||
|
|
||||||
|
### Docker-Compose Files (AppArmor Fix Applied)
|
||||||
|
- `/home/cal/container-data/ul-solo-game/docker-compose.yml`
|
||||||
|
- `/home/cal/container-data/ul-test/docker-compose.yml`
|
||||||
|
- `/home/cal/container-data/ul-public/docker-compose.yml`
|
||||||
|
|
||||||
|
### Proxmox Configuration
|
||||||
|
- `/etc/pve/lxc/211.conf` (LXC config with AppArmor unconfined)
|
||||||
|
|
||||||
|
### Backups Created
|
||||||
|
- `docker-compose.yml.backup` (all three directories)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Criteria Met
|
||||||
|
|
||||||
|
✅ All success criteria from migration plan achieved:
|
||||||
|
|
||||||
|
- [x] Services running stable in LXC
|
||||||
|
- [x] No performance degradation
|
||||||
|
- [x] Backup/restore procedure understood
|
||||||
|
- [x] Rollback procedure validated
|
||||||
|
- [x] Process documented for next waves
|
||||||
|
- [x] AppArmor solution identified and documented
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Recommendations for Remaining Waves
|
||||||
|
|
||||||
|
### Wave 2 (docker-pittsburgh + docker-vpn)
|
||||||
|
- **Pre-apply AppArmor fix** before starting containers
|
||||||
|
- **Size disks appropriately** from the start
|
||||||
|
- **Test VPN routing** carefully (docker-vpn specific)
|
||||||
|
- **Expected time**: 2-3 hours per host
|
||||||
|
|
||||||
|
### General Recommendations
|
||||||
|
1. **Batch similar services**: Migrate Docker hosts together (leverage learnings)
|
||||||
|
2. **Off-hours migrations**: Minimize user impact
|
||||||
|
3. **Document per-wave**: Capture unique issues for each service type
|
||||||
|
4. **Automate AppArmor fix**: Create script to modify docker-compose files automatically
|
||||||
|
5. **Right-size after monitoring**: Review resource allocation after 1-2 weeks
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Contact
|
||||||
|
|
||||||
|
**Migration Owner**: Cal Corum (cal.corum@gmail.com)
|
||||||
|
**Date Completed**: 2025-01-12
|
||||||
|
**Next Wave**: Wave 2 (docker-pittsburgh, docker-vpn) - TBD
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Status**: ✅ **Wave 1 Complete - Ready for Wave 2**
|
||||||
278
vm-management/wave2-migration-results.md
Normal file
278
vm-management/wave2-migration-results.md
Normal file
@ -0,0 +1,278 @@
|
|||||||
|
# Wave 2 Migration Results - docker-vpn (VM 121 → LXC 221 arr-stack)
|
||||||
|
|
||||||
|
**Date**: 2025-12-05
|
||||||
|
**Status**: **SUCCESSFUL**
|
||||||
|
**Migration Time**: ~2 hours
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Successfully migrated and restructured docker-vpn VM (121) to arr-stack LXC (221). The migration involved a significant architecture simplification - eliminating the Mullvad VPN entirely since only Usenet is used (SSL to Usenet provider is sufficient, no torrents). Additionally replaced Overseerr with Jellyseerr for native Jellyfin support.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Details
|
||||||
|
|
||||||
|
### Source (VM 121 - docker-vpn)
|
||||||
|
- **OS**: Ubuntu (in VM)
|
||||||
|
- **Services**: Sonarr, Radarr, Readarr, Overseerr, SABnzbd, Mullvad VPN
|
||||||
|
- **Architecture**: All traffic routed through Mullvad VPN container
|
||||||
|
- **Complexity**: High (VPN routing, multiple network namespaces)
|
||||||
|
|
||||||
|
### Destination (LXC 221 - arr-stack)
|
||||||
|
- **OS**: Ubuntu 20.04 LTS (privileged LXC)
|
||||||
|
- **Resources**: 2 cores, 4GB RAM, 32GB disk
|
||||||
|
- **IP**: 10.10.0.221
|
||||||
|
- **Services**: Sonarr, Radarr, Readarr, Jellyseerr, SABnzbd
|
||||||
|
- **Architecture**: Direct network access (no VPN)
|
||||||
|
- **Complexity**: Low (standard Docker containers)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Changes
|
||||||
|
|
||||||
|
### Before (docker-vpn)
|
||||||
|
```
|
||||||
|
Internet
|
||||||
|
↓
|
||||||
|
Mullvad VPN Container
|
||||||
|
↓ (all traffic tunneled)
|
||||||
|
├─ Sonarr
|
||||||
|
├─ Radarr
|
||||||
|
├─ Readarr
|
||||||
|
├─ Overseerr
|
||||||
|
└─ SABnzbd
|
||||||
|
```
|
||||||
|
|
||||||
|
### After (arr-stack)
|
||||||
|
```
|
||||||
|
Internet
|
||||||
|
↓ (direct, SSL encrypted to Usenet)
|
||||||
|
├─ Sonarr
|
||||||
|
├─ Radarr
|
||||||
|
├─ Readarr
|
||||||
|
├─ Jellyseerr (replaced Overseerr)
|
||||||
|
└─ SABnzbd
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Decision: VPN Elimination
|
||||||
|
**Rationale**:
|
||||||
|
- Only using Usenet (not torrents)
|
||||||
|
- Usenet providers support SSL encryption
|
||||||
|
- SSL to Usenet provider provides sufficient privacy
|
||||||
|
- VPN added complexity without meaningful benefit
|
||||||
|
- Simplified troubleshooting and maintenance
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technical Implementation
|
||||||
|
|
||||||
|
### LXC Configuration
|
||||||
|
```
|
||||||
|
# /etc/pve/lxc/221.conf
|
||||||
|
arch: amd64
|
||||||
|
cores: 2
|
||||||
|
features: nesting=1,keyctl=1
|
||||||
|
hostname: arr-stack
|
||||||
|
memory: 4096
|
||||||
|
net0: name=eth0,bridge=vmbr0,gw=10.10.0.1,ip=10.10.0.221/24,type=veth
|
||||||
|
ostype: ubuntu
|
||||||
|
rootfs: local-lvm:vm-221-disk-0,size=32G
|
||||||
|
swap: 512
|
||||||
|
lxc.apparmor.profile: unconfined
|
||||||
|
```
|
||||||
|
|
||||||
|
### Docker Compose
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
sonarr:
|
||||||
|
image: linuxserver/sonarr:latest
|
||||||
|
container_name: sonarr
|
||||||
|
ports: ["8989:8989"]
|
||||||
|
volumes:
|
||||||
|
- ./config/sonarr:/config
|
||||||
|
- /mnt/media:/media
|
||||||
|
security_opt: [apparmor=unconfined]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
radarr:
|
||||||
|
image: linuxserver/radarr:latest
|
||||||
|
container_name: radarr
|
||||||
|
ports: ["7878:7878"]
|
||||||
|
volumes:
|
||||||
|
- ./config/radarr:/config
|
||||||
|
- /mnt/media:/media
|
||||||
|
security_opt: [apparmor=unconfined]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
readarr:
|
||||||
|
image: ghcr.io/hotio/readarr:latest
|
||||||
|
container_name: readarr
|
||||||
|
ports: ["8787:8787"]
|
||||||
|
volumes:
|
||||||
|
- ./config/readarr:/config
|
||||||
|
- /mnt/media:/media
|
||||||
|
security_opt: [apparmor=unconfined]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
jellyseerr:
|
||||||
|
image: fallenbagel/jellyseerr:latest
|
||||||
|
container_name: jellyseerr
|
||||||
|
ports: ["5055:5055"]
|
||||||
|
volumes:
|
||||||
|
- ./config/jellyseerr:/app/config
|
||||||
|
security_opt: [apparmor=unconfined]
|
||||||
|
restart: unless-stopped
|
||||||
|
|
||||||
|
sabnzbd:
|
||||||
|
image: linuxserver/sabnzbd:latest
|
||||||
|
container_name: sabnzbd
|
||||||
|
ports: ["8080:8080"]
|
||||||
|
volumes:
|
||||||
|
- ./config/sabnzbd:/config
|
||||||
|
- ./downloads:/downloads
|
||||||
|
- /mnt/media:/media
|
||||||
|
security_opt: [apparmor=unconfined]
|
||||||
|
restart: unless-stopped
|
||||||
|
```
|
||||||
|
|
||||||
|
### CIFS Mount
|
||||||
|
```fstab
|
||||||
|
//10.10.0.35/media /mnt/media cifs vers=3.0,uid=0,credentials=/root/.smbcredentials 0 0
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Issues Encountered & Solutions
|
||||||
|
|
||||||
|
### Issue 1: linuxserver.io Registry (lscr.io) Pull Failures
|
||||||
|
**Problem**: `no matching manifest for linux/amd64` errors from lscr.io registry
|
||||||
|
**Solution**: Switched to Docker Hub images directly (`linuxserver/sonarr` instead of `lscr.io/linuxserver/sonarr`)
|
||||||
|
|
||||||
|
### Issue 2: Readarr Image Not Available
|
||||||
|
**Problem**: linuxserver/readarr:develop tag not available for amd64
|
||||||
|
**Solution**: Switched to hotio image (`ghcr.io/hotio/readarr:latest`)
|
||||||
|
|
||||||
|
### Issue 3: Jellyseerr Tag Validation Error
|
||||||
|
**Problem**: Radarr rejecting requests with "Label: Allowed characters a-z, 0-9 and -"
|
||||||
|
**Cause**: Jellyseerr sending tags with invalid characters to Radarr
|
||||||
|
**Solution**: Disabled tags in Jellyseerr Radarr integration settings
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Migration
|
||||||
|
|
||||||
|
### Configs Migrated
|
||||||
|
- **Sonarr**: ~1.4GB (database, MediaCover cache, backups)
|
||||||
|
- **Radarr**: ~1.6GB (database, MediaCover cache, backups)
|
||||||
|
- **Readarr**: ~88MB (database, backups)
|
||||||
|
- **Overseerr**: ~7.7MB (database, settings) - Not used, replaced with Jellyseerr
|
||||||
|
|
||||||
|
### Fresh Configuration Required
|
||||||
|
- **SABnzbd**: Fresh install (user configured Usenet provider)
|
||||||
|
- **Jellyseerr**: Fresh install (connected to Jellyfin)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Validation Results
|
||||||
|
|
||||||
|
| Service | Port | Status | Test |
|
||||||
|
|---------|------|--------|------|
|
||||||
|
| Sonarr | 8989 | HTTP 200 | Database loaded, shows configured |
|
||||||
|
| Radarr | 7878 | HTTP 200 | Database loaded, movie requests working |
|
||||||
|
| Readarr | 8787 | HTTP 200 | Database loaded, shows configured |
|
||||||
|
| Jellyseerr | 5055 | HTTP 307 | Connected to Jellyfin, requests working |
|
||||||
|
| SABnzbd | 8080 | HTTP 303 | Configured with Usenet provider |
|
||||||
|
| CIFS Mount | - | Working | Media accessible in containers |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resource Comparison
|
||||||
|
|
||||||
|
### Before (VM 121)
|
||||||
|
- **Memory**: Full VM overhead (~1-2GB for OS)
|
||||||
|
- **Disk**: Larger allocation for VM image
|
||||||
|
- **Complexity**: VPN routing, multiple network namespaces
|
||||||
|
- **Maintenance**: VPN updates, connection monitoring
|
||||||
|
|
||||||
|
### After (LXC 221)
|
||||||
|
- **Memory**: ~100MB LXC overhead
|
||||||
|
- **Disk**: 32GB (minimal)
|
||||||
|
- **Complexity**: Standard Docker containers
|
||||||
|
- **Maintenance**: Standard container updates only
|
||||||
|
|
||||||
|
### Efficiency Gains
|
||||||
|
- **~1.5GB RAM saved** (VM overhead eliminated)
|
||||||
|
- **Simplified networking** (no VPN routing)
|
||||||
|
- **Reduced attack surface** (fewer services)
|
||||||
|
- **Faster boot time** (LXC vs VM)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## NPM/Reverse Proxy Updates
|
||||||
|
|
||||||
|
Updated Nginx Proxy Manager entries to point to new IP:
|
||||||
|
- sonarr.manticorum.com → 10.10.0.221:8989
|
||||||
|
- radarr.manticorum.com → 10.10.0.221:7878
|
||||||
|
- readarr.manticorum.com → 10.10.0.221:8787
|
||||||
|
- jellyseerr.manticorum.com → 10.10.0.221:5055 (new, replaces overseerr)
|
||||||
|
- sabnzbd.manticorum.com → 10.10.0.221:8080
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollback Capability
|
||||||
|
|
||||||
|
- **VM 121 preserved**: Can be restarted if issues arise
|
||||||
|
- **Rollback time**: <5 minutes
|
||||||
|
- **Recommendation**: Keep VM 121 stopped for 48 hours, then decommission
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key Learnings
|
||||||
|
|
||||||
|
### 1. VPN Complexity Often Unnecessary
|
||||||
|
For Usenet-only setups, VPN adds complexity without meaningful benefit. SSL to the Usenet provider is sufficient.
|
||||||
|
|
||||||
|
### 2. Image Registry Issues
|
||||||
|
lscr.io can have availability issues. Docker Hub images work as fallback.
|
||||||
|
|
||||||
|
### 3. Application Substitution
|
||||||
|
Jellyseerr is a drop-in replacement for Overseerr with native Jellyfin support - worth the switch if using Jellyfin.
|
||||||
|
|
||||||
|
### 4. Tag/Label Validation
|
||||||
|
When connecting Jellyseerr to arr apps, be careful with tag configurations - invalid characters cause silent failures.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
### Immediate
|
||||||
|
- [x] Configure SABnzbd with Usenet provider
|
||||||
|
- [x] Connect arr apps to new SABnzbd
|
||||||
|
- [x] Update NPM reverse proxy entries
|
||||||
|
- [x] Test movie/show requests through Jellyseerr
|
||||||
|
|
||||||
|
### After 48 Hours
|
||||||
|
- [ ] Decommission VM 121 (docker-vpn)
|
||||||
|
- [ ] Clean up local migration temp files (`/tmp/arr-config-migration/`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
### On LXC 221
|
||||||
|
- `/opt/arr-stack/docker-compose.yml`
|
||||||
|
- `/opt/arr-stack/config/` (all service configs)
|
||||||
|
- `/root/.smbcredentials`
|
||||||
|
- `/etc/fstab` (CIFS mount)
|
||||||
|
|
||||||
|
### Documentation Updated
|
||||||
|
- `vm-management/lxc-migration-plan.md` - Wave 2 status
|
||||||
|
- `networking/server-inventory.md` - Added arr-stack entry
|
||||||
|
- `vm-management/wave2-migration-results.md` - This file
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Status**: **Wave 2 Complete - Ready for Wave 3**
|
||||||
|
**Contact**: Cal Corum (cal.corum@gmail.com)
|
||||||
Loading…
Reference in New Issue
Block a user