diff --git a/vm-management/proxmox-upgrades/proxmox-7-to-9-upgrade-plan.md b/vm-management/proxmox-upgrades/proxmox-7-to-9-upgrade-plan.md new file mode 100644 index 0000000..cc8d477 --- /dev/null +++ b/vm-management/proxmox-upgrades/proxmox-7-to-9-upgrade-plan.md @@ -0,0 +1,376 @@ +# Proxmox VE Upgrade Plan: 7.1-7 → 9.1 + +## Executive Summary + +**Current State**: Proxmox VE 7.1-7 (kernel 5.13.19-2-pve) +**Target State**: Proxmox VE 9.1 (latest) +**Upgrade Path**: Two-phase upgrade (7→8→9) - direct upgrade not supported +**Total Timeline**: 3-4 weeks (including stabilization periods) +**Total Downtime**: ~4 hours (2 hours per phase) + +## Infrastructure Overview + +**Production Services** (8 LXC + 17 VMs): +- **Critical**: Paper Dynasty/Major Domo (VMs 115, 110), Gitea (LXC 225), n8n (LXC 210), Home Assistant (VM 109) +- **Important**: Media services (Plex 107, Tdarr 113, arr-stack 221), OpenClaw (224), Databases (112) +- **Lower Priority**: Game servers, development containers + +**Key Constraints**: +- Home Assistant VM 109 requires dual network (vmbr1 for Matter support) +- All production Discord bots must minimize downtime +- Gitea mirrored to GitHub provides backup +- TrueNAS backup mount at 10.10.0.35 + +--- + +## Phase 1: Proxmox 7.1 → 8.4 Upgrade + +### Pre-Upgrade Preparation (1-2 days) + +#### 1. Comprehensive Backups + +**Priority 1 - Production Services**: +```bash +# Backup critical services to TrueNAS +vzdump 210 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # n8n +vzdump 115 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # docker-sba +vzdump 112 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # databases +vzdump 110 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # discord-bots +vzdump 225 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # gitea +vzdump 109 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # homeassistant +``` + +**Priority 2 - All Remaining VMs/LXCs**: +```bash +vzdump --all --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd +``` + +**Backup Proxmox Configuration**: +```bash +tar -czf /mnt/truenas/proxmox/pve-config-$(date +%Y%m%d).tar.gz /etc/pve/ +cp /etc/network/interfaces /mnt/truenas/proxmox/interfaces.backup +``` + +**Expected**: 2-4 hours, ~500GB-1TB storage required + +#### 2. Pre-Upgrade Validation + +```bash +# Run Proxmox 7-to-8 checker +pve7to8 --full + +# Update to latest PVE 7.4 +apt update && apt dist-upgrade -y + +# Verify minimum version +pveversion # Must show 7.4-15 or higher + +# Document current state +pvesh get /cluster/resources --type vm --output-format yaml > /mnt/truenas/proxmox/vm-inventory-pre-upgrade.yaml +``` + +#### 3. Maintenance Window Planning + +**Recommended Timing**: Overnight or early morning weekend +**Estimated Downtime**: 1.5-2.5 hours +**Notifications Required**: Discord bot users, game server players + +### Upgrade Execution (2-4 hours including downtime) + +#### 1. Update to Latest PVE 7.4 +```bash +apt update && apt dist-upgrade -y +pveversion # Verify 7.4-XX +reboot +``` + +#### 2. Configure PVE 8 Repositories +```bash +# Backup current config +cp /etc/apt/sources.list /etc/apt/sources.list.pve7-backup +cp -a /etc/apt/sources.list.d/ /etc/apt/sources.list.d.pve7-backup/ + +# Update repositories (Bullseye → Bookworm) +sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list +echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list +sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null || true + +apt update +``` + +#### 3. Execute Distribution Upgrade +```bash +apt dist-upgrade +# Duration: 15-45 minutes +# Accept new versions of /etc/issue +# Keep current versions of customized configs + +reboot +``` + +#### 4. Verify PVE 8 Installation +```bash +pveversion # Should show pve-manager/8.4-X +uname -r # Should show 6.8.X-X-pve + +# Verify services +systemctl status pve-cluster pvedaemon pveproxy pvestatd +pvesm status +``` + +### Post-Upgrade Validation + +**Start Services in Dependency Order**: +```bash +# Databases first +pvesh create /nodes/proxmox/qemu/112/status/start + +# Infrastructure +pvesh create /nodes/proxmox/lxc/225/status/start # gitea +pvesh create /nodes/proxmox/lxc/210/status/start # n8n + +# Applications +pvesh create /nodes/proxmox/qemu/115/status/start # docker-sba (Paper Dynasty) +pvesh create /nodes/proxmox/qemu/110/status/start # discord-bots +pvesh create /nodes/proxmox/lxc/224/status/start # openclaw + +# Media & Others +pvesh create /nodes/proxmox/qemu/109/status/start # homeassistant +pvesh create /nodes/proxmox/qemu/107/status/start # plex +pvesh create /nodes/proxmox/lxc/221/status/start # arr-stack +``` + +**Service Validation Checklist**: +- [ ] Discord bots responding in Discord +- [ ] Database connections working +- [ ] n8n workflows executing +- [ ] Gitea accessible at git.manticorum.com +- [ ] Home Assistant automations running +- [ ] Media servers streaming (Plex/Jellyfin) +- [ ] Web UI accessible and functional + +### Stabilization Period + +**Wait 1-2 weeks before PVE 9 upgrade** + +Monitor for: +- VM/LXC stability +- Performance issues +- Service uptime +- Error logs + +--- + +## Phase 2: Proxmox 8.4 → 9.1 Upgrade + +### Pre-Upgrade Preparation (1 day) + +#### 1. LXC Compatibility Check (CRITICAL) + +```bash +# Verify systemd version in each LXC (must be > 230) +for ct in 108 210 211 221 222 223 224 225; do + echo "=== LXC $ct ===" + pct exec $ct -- systemd --version | head -1 +done +``` + +**Action Required**: If any LXC shows systemd < 230: +```bash +pct enter +apt update && apt dist-upgrade -y +do-release-upgrade # Upgrade Ubuntu to compatible version +``` + +**Expected**: All Ubuntu 20.04+ LXCs should be compatible (systemd 245+) + +#### 2. Fresh Backup Set +```bash +vzdump --all --mode snapshot --dumpdir /mnt/truenas/proxmox/pve9-upgrade --compress zstd +tar -czf /mnt/truenas/proxmox/pve8-config-$(date +%Y%m%d).tar.gz /etc/pve/ +``` + +#### 3. Run PVE 8-to-9 Checker +```bash +pve8to9 --full +``` + +### Upgrade Execution (2-4 hours including downtime) + +#### 1. Configure PVE 9 Repositories +```bash +# Backup PVE 8 config +cp /etc/apt/sources.list /etc/apt/sources.list.pve8-backup +cp -a /etc/apt/sources.list.d/ /etc/apt/sources.list.d.pve8-backup/ + +# Update repositories (Bookworm → Trixie) +sed -i 's/bookworm/trixie/g' /etc/apt/sources.list +echo "deb http://download.proxmox.com/debian/pve trixie pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list +sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null || true + +apt update +``` + +#### 2. Execute Distribution Upgrade +```bash +apt dist-upgrade +# Duration: 20-60 minutes + +reboot +``` + +#### 3. Verify PVE 9 Installation +```bash +pveversion # Should show pve-manager/9.1-X +uname -r # Should show 6.14.X-X-pve + +# Verify cgroupv2 (PVE 9 requirement) +mount | grep cgroup2 + +# Verify services +systemctl status pve-cluster pvedaemon pveproxy pvestatd +pvesm status +``` + +### Post-Upgrade Validation + +**Start and validate services** using same procedure as PVE 8 upgrade. + +**Additional PVE 9 Checks**: +- Web UI with cleared browser cache (Ctrl+Shift+R) +- Memory reporting (PVE 9 includes overhead in VM memory) +- Storage performance validation + +--- + +## Rollback Procedures + +### If PVE 8 Upgrade Fails + +**During dist-upgrade**: +```bash +apt --fix-broken install +dpkg --configure -a + +# If unrecoverable: +cp /etc/apt/sources.list.pve7-backup /etc/apt/sources.list +cp -a /etc/apt/sources.list.d.pve7-backup/* /etc/apt/sources.list.d/ +apt update && apt install pve-manager/7.4 +``` + +**After reboot to unstable system**: +- Boot to previous kernel via GRUB → Advanced options +- Rollback repositories as above + +### If PVE 9 Upgrade Fails + +```bash +cp /etc/apt/sources.list.pve8-backup /etc/apt/sources.list +cp -a /etc/apt/sources.list.d.pve8-backup/* /etc/apt/sources.list.d/ +apt update && apt dist-upgrade +reboot +``` + +### If VM/LXC Won't Start + +**Restore from backup**: +```bash +# LXC +pct restore /mnt/truenas/proxmox/vzdump-lxc--*.tar.zst --storage local-lvm + +# VM +qmrestore /mnt/truenas/proxmox/vzdump-qemu--*.vma.zst +``` + +### Complete Reinstallation (Last Resort) + +1. Reinstall Proxmox VE 9 from ISO +2. Restore configs from `/mnt/truenas/proxmox/pve-config-*/` +3. Restore VMs/LXCs from backups +4. Reconfigure networking if needed + +--- + +## Risk Assessment + +| Component | Risk | Impact | Mitigation | +|-----------|------|--------|-----------| +| Production Bots (115, 110) | HIGH | Service downtime | Backup instance ready, notify users | +| Databases (112) | HIGH | Data loss | Multiple backups, test restore | +| LXC systemd compatibility | MEDIUM | Container won't start | Pre-verify versions, upgrade OS if needed | +| Network config | MEDIUM | Connectivity loss | Document config, console access | +| n8n workflows (210) | MEDIUM | Automation failures | Export workflow configs | + +**Low Risk**: Game servers, templates, unused services + +--- + +## Post-Upgrade Tasks + +### 1. Update Documentation +- Record upgrade completion in `/mnt/NV2/Development/claude-home/vm-management/` +- Update Proxmox version references +- Document issues encountered + +### 2. Performance Validation +```bash +pvesh get /cluster/resources +``` + +### 3. Long-Term Monitoring +- Daily health checks +- Resource utilization trends +- Plan next upgrade (PVE 9.x updates) + +--- + +## Timeline Summary + +| Phase | Duration | Downtime | Activity | +|-------|----------|----------|----------| +| Pre-PVE8 Prep | 1-2 days | None | Backups, validation | +| PVE 7→8 Upgrade | 2-4 hours | 1.5-2.5 hours | Repository update, upgrade | +| PVE 8 Stabilization | 1-2 weeks | None | Monitor, validate | +| Pre-PVE9 Prep | 1 day | None | LXC validation, backups | +| PVE 8→9 Upgrade | 2-4 hours | 1.5-2.5 hours | Repository update, upgrade | +| Post-Upgrade | 1-2 days | None | Documentation, optimization | +| **TOTAL** | **3-4 weeks** | **~4 hours** | Full upgrade with stabilization | + +--- + +## Critical Files + +- `/etc/pve/qemu-server/*.conf` - VM configurations (backup critical) +- `/etc/pve/lxc/*.conf` - LXC configurations (backup critical) +- `/etc/network/interfaces` - Network config (document before changes) +- `/etc/apt/sources.list` - Repository config (will be modified) +- `/etc/apt/sources.list.d/pve-*.list` - Proxmox repos (will be modified) + +--- + +## Verification Checklist + +After each upgrade phase: + +- [ ] Proxmox version correct (`pveversion`) +- [ ] Kernel version updated (`uname -r`) +- [ ] All services running (`systemctl status pve-*`) +- [ ] Storage accessible (`pvesm status`) +- [ ] Network functional (`ip addr`, `ip route`) +- [ ] All VMs/LXCs visible in UI +- [ ] Critical VMs/LXCs started successfully +- [ ] Discord bots responding +- [ ] Databases accessible +- [ ] n8n workflows running +- [ ] Gitea accessible +- [ ] Home Assistant functional +- [ ] Media streaming working +- [ ] Web UI functional (clear cache first) + +--- + +## Sources + +- [Proxmox VE: Upgrade from 7 to 8](https://pve.proxmox.com/wiki/Upgrade_from_7_to_8) +- [Proxmox VE: Upgrade from 8 to 9](https://pve.proxmox.com/wiki/Upgrade_from_8_to_9) +- [Proxmox VE: Backup and Restore](https://pve.proxmox.com/wiki/Backup_and_Restore)