claude-home/vm-management/proxmox-upgrades/proxmox-7-to-9-upgrade-plan.md
Cal Corum 7eadacc6db Add comprehensive Proxmox VE 7.1 → 9.1 upgrade plan
Create detailed two-phase upgrade strategy for Proxmox hypervisor:
- Phase 1: 7.1 → 8.4 (Debian Bullseye → Bookworm)
- Phase 2: 8.4 → 9.1 (Debian Bookworm → Trixie)

Plan includes:
- Pre-upgrade preparation and backup procedures
- Step-by-step upgrade execution for both phases
- Service validation and dependency order
- Rollback procedures for failure scenarios
- Risk assessment with mitigation strategies
- Timeline: 3-4 weeks total, ~4 hours downtime

Critical considerations:
- 8 LXC containers + 17 VMs to maintain
- Production services (Discord bots, databases, Gitea, n8n)
- Home Assistant dual network requirements
- LXC systemd compatibility checks for PVE 9

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-03 17:11:51 -06:00

11 KiB

Proxmox VE Upgrade Plan: 7.1-7 → 9.1

Executive Summary

Current State: Proxmox VE 7.1-7 (kernel 5.13.19-2-pve) Target State: Proxmox VE 9.1 (latest) Upgrade Path: Two-phase upgrade (7→8→9) - direct upgrade not supported Total Timeline: 3-4 weeks (including stabilization periods) Total Downtime: ~4 hours (2 hours per phase)

Infrastructure Overview

Production Services (8 LXC + 17 VMs):

  • Critical: Paper Dynasty/Major Domo (VMs 115, 110), Gitea (LXC 225), n8n (LXC 210), Home Assistant (VM 109)
  • Important: Media services (Plex 107, Tdarr 113, arr-stack 221), OpenClaw (224), Databases (112)
  • Lower Priority: Game servers, development containers

Key Constraints:

  • Home Assistant VM 109 requires dual network (vmbr1 for Matter support)
  • All production Discord bots must minimize downtime
  • Gitea mirrored to GitHub provides backup
  • TrueNAS backup mount at 10.10.0.35

Phase 1: Proxmox 7.1 → 8.4 Upgrade

Pre-Upgrade Preparation (1-2 days)

1. Comprehensive Backups

Priority 1 - Production Services:

# Backup critical services to TrueNAS
vzdump 210 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd  # n8n
vzdump 115 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd  # docker-sba
vzdump 112 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd  # databases
vzdump 110 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd  # discord-bots
vzdump 225 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd  # gitea
vzdump 109 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd  # homeassistant

Priority 2 - All Remaining VMs/LXCs:

vzdump --all --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd

Backup Proxmox Configuration:

tar -czf /mnt/truenas/proxmox/pve-config-$(date +%Y%m%d).tar.gz /etc/pve/
cp /etc/network/interfaces /mnt/truenas/proxmox/interfaces.backup

Expected: 2-4 hours, ~500GB-1TB storage required

2. Pre-Upgrade Validation

# Run Proxmox 7-to-8 checker
pve7to8 --full

# Update to latest PVE 7.4
apt update && apt dist-upgrade -y

# Verify minimum version
pveversion  # Must show 7.4-15 or higher

# Document current state
pvesh get /cluster/resources --type vm --output-format yaml > /mnt/truenas/proxmox/vm-inventory-pre-upgrade.yaml

3. Maintenance Window Planning

Recommended Timing: Overnight or early morning weekend Estimated Downtime: 1.5-2.5 hours Notifications Required: Discord bot users, game server players

Upgrade Execution (2-4 hours including downtime)

1. Update to Latest PVE 7.4

apt update && apt dist-upgrade -y
pveversion  # Verify 7.4-XX
reboot

2. Configure PVE 8 Repositories

# Backup current config
cp /etc/apt/sources.list /etc/apt/sources.list.pve7-backup
cp -a /etc/apt/sources.list.d/ /etc/apt/sources.list.d.pve7-backup/

# Update repositories (Bullseye → Bookworm)
sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list
sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null || true

apt update

3. Execute Distribution Upgrade

apt dist-upgrade
# Duration: 15-45 minutes
# Accept new versions of /etc/issue
# Keep current versions of customized configs

reboot

4. Verify PVE 8 Installation

pveversion  # Should show pve-manager/8.4-X
uname -r    # Should show 6.8.X-X-pve

# Verify services
systemctl status pve-cluster pvedaemon pveproxy pvestatd
pvesm status

Post-Upgrade Validation

Start Services in Dependency Order:

# Databases first
pvesh create /nodes/proxmox/qemu/112/status/start

# Infrastructure
pvesh create /nodes/proxmox/lxc/225/status/start  # gitea
pvesh create /nodes/proxmox/lxc/210/status/start  # n8n

# Applications
pvesh create /nodes/proxmox/qemu/115/status/start  # docker-sba (Paper Dynasty)
pvesh create /nodes/proxmox/qemu/110/status/start  # discord-bots
pvesh create /nodes/proxmox/lxc/224/status/start  # openclaw

# Media & Others
pvesh create /nodes/proxmox/qemu/109/status/start  # homeassistant
pvesh create /nodes/proxmox/qemu/107/status/start  # plex
pvesh create /nodes/proxmox/lxc/221/status/start  # arr-stack

Service Validation Checklist:

  • Discord bots responding in Discord
  • Database connections working
  • n8n workflows executing
  • Gitea accessible at git.manticorum.com
  • Home Assistant automations running
  • Media servers streaming (Plex/Jellyfin)
  • Web UI accessible and functional

Stabilization Period

Wait 1-2 weeks before PVE 9 upgrade

Monitor for:

  • VM/LXC stability
  • Performance issues
  • Service uptime
  • Error logs

Phase 2: Proxmox 8.4 → 9.1 Upgrade

Pre-Upgrade Preparation (1 day)

1. LXC Compatibility Check (CRITICAL)

# Verify systemd version in each LXC (must be > 230)
for ct in 108 210 211 221 222 223 224 225; do
    echo "=== LXC $ct ==="
    pct exec $ct -- systemd --version | head -1
done

Action Required: If any LXC shows systemd < 230:

pct enter <CTID>
apt update && apt dist-upgrade -y
do-release-upgrade  # Upgrade Ubuntu to compatible version

Expected: All Ubuntu 20.04+ LXCs should be compatible (systemd 245+)

2. Fresh Backup Set

vzdump --all --mode snapshot --dumpdir /mnt/truenas/proxmox/pve9-upgrade --compress zstd
tar -czf /mnt/truenas/proxmox/pve8-config-$(date +%Y%m%d).tar.gz /etc/pve/

3. Run PVE 8-to-9 Checker

pve8to9 --full

Upgrade Execution (2-4 hours including downtime)

1. Configure PVE 9 Repositories

# Backup PVE 8 config
cp /etc/apt/sources.list /etc/apt/sources.list.pve8-backup
cp -a /etc/apt/sources.list.d/ /etc/apt/sources.list.d.pve8-backup/

# Update repositories (Bookworm → Trixie)
sed -i 's/bookworm/trixie/g' /etc/apt/sources.list
echo "deb http://download.proxmox.com/debian/pve trixie pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list
sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null || true

apt update

2. Execute Distribution Upgrade

apt dist-upgrade
# Duration: 20-60 minutes

reboot

3. Verify PVE 9 Installation

pveversion  # Should show pve-manager/9.1-X
uname -r    # Should show 6.14.X-X-pve

# Verify cgroupv2 (PVE 9 requirement)
mount | grep cgroup2

# Verify services
systemctl status pve-cluster pvedaemon pveproxy pvestatd
pvesm status

Post-Upgrade Validation

Start and validate services using same procedure as PVE 8 upgrade.

Additional PVE 9 Checks:

  • Web UI with cleared browser cache (Ctrl+Shift+R)
  • Memory reporting (PVE 9 includes overhead in VM memory)
  • Storage performance validation

Rollback Procedures

If PVE 8 Upgrade Fails

During dist-upgrade:

apt --fix-broken install
dpkg --configure -a

# If unrecoverable:
cp /etc/apt/sources.list.pve7-backup /etc/apt/sources.list
cp -a /etc/apt/sources.list.d.pve7-backup/* /etc/apt/sources.list.d/
apt update && apt install pve-manager/7.4

After reboot to unstable system:

  • Boot to previous kernel via GRUB → Advanced options
  • Rollback repositories as above

If PVE 9 Upgrade Fails

cp /etc/apt/sources.list.pve8-backup /etc/apt/sources.list
cp -a /etc/apt/sources.list.d.pve8-backup/* /etc/apt/sources.list.d/
apt update && apt dist-upgrade
reboot

If VM/LXC Won't Start

Restore from backup:

# LXC
pct restore <CTID> /mnt/truenas/proxmox/vzdump-lxc-<CTID>-*.tar.zst --storage local-lvm

# VM
qmrestore /mnt/truenas/proxmox/vzdump-qemu-<VMID>-*.vma.zst <VMID>

Complete Reinstallation (Last Resort)

  1. Reinstall Proxmox VE 9 from ISO
  2. Restore configs from /mnt/truenas/proxmox/pve-config-*/
  3. Restore VMs/LXCs from backups
  4. Reconfigure networking if needed

Risk Assessment

Component Risk Impact Mitigation
Production Bots (115, 110) HIGH Service downtime Backup instance ready, notify users
Databases (112) HIGH Data loss Multiple backups, test restore
LXC systemd compatibility MEDIUM Container won't start Pre-verify versions, upgrade OS if needed
Network config MEDIUM Connectivity loss Document config, console access
n8n workflows (210) MEDIUM Automation failures Export workflow configs

Low Risk: Game servers, templates, unused services


Post-Upgrade Tasks

1. Update Documentation

  • Record upgrade completion in /mnt/NV2/Development/claude-home/vm-management/
  • Update Proxmox version references
  • Document issues encountered

2. Performance Validation

pvesh get /cluster/resources

3. Long-Term Monitoring

  • Daily health checks
  • Resource utilization trends
  • Plan next upgrade (PVE 9.x updates)

Timeline Summary

Phase Duration Downtime Activity
Pre-PVE8 Prep 1-2 days None Backups, validation
PVE 7→8 Upgrade 2-4 hours 1.5-2.5 hours Repository update, upgrade
PVE 8 Stabilization 1-2 weeks None Monitor, validate
Pre-PVE9 Prep 1 day None LXC validation, backups
PVE 8→9 Upgrade 2-4 hours 1.5-2.5 hours Repository update, upgrade
Post-Upgrade 1-2 days None Documentation, optimization
TOTAL 3-4 weeks ~4 hours Full upgrade with stabilization

Critical Files

  • /etc/pve/qemu-server/*.conf - VM configurations (backup critical)
  • /etc/pve/lxc/*.conf - LXC configurations (backup critical)
  • /etc/network/interfaces - Network config (document before changes)
  • /etc/apt/sources.list - Repository config (will be modified)
  • /etc/apt/sources.list.d/pve-*.list - Proxmox repos (will be modified)

Verification Checklist

After each upgrade phase:

  • Proxmox version correct (pveversion)
  • Kernel version updated (uname -r)
  • All services running (systemctl status pve-*)
  • Storage accessible (pvesm status)
  • Network functional (ip addr, ip route)
  • All VMs/LXCs visible in UI
  • Critical VMs/LXCs started successfully
  • Discord bots responding
  • Databases accessible
  • n8n workflows running
  • Gitea accessible
  • Home Assistant functional
  • Media streaming working
  • Web UI functional (clear cache first)

Sources