Add comprehensive Proxmox VE 7.1 → 9.1 upgrade plan

Create detailed two-phase upgrade strategy for Proxmox hypervisor:
- Phase 1: 7.1 → 8.4 (Debian Bullseye → Bookworm)
- Phase 2: 8.4 → 9.1 (Debian Bookworm → Trixie)

Plan includes:
- Pre-upgrade preparation and backup procedures
- Step-by-step upgrade execution for both phases
- Service validation and dependency order
- Rollback procedures for failure scenarios
- Risk assessment with mitigation strategies
- Timeline: 3-4 weeks total, ~4 hours downtime

Critical considerations:
- 8 LXC containers + 17 VMs to maintain
- Production services (Discord bots, databases, Gitea, n8n)
- Home Assistant dual network requirements
- LXC systemd compatibility checks for PVE 9

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Cal Corum 2026-02-03 17:11:51 -06:00
parent 76dc82ce7c
commit 7eadacc6db

View File

@ -0,0 +1,376 @@
# Proxmox VE Upgrade Plan: 7.1-7 → 9.1
## Executive Summary
**Current State**: Proxmox VE 7.1-7 (kernel 5.13.19-2-pve)
**Target State**: Proxmox VE 9.1 (latest)
**Upgrade Path**: Two-phase upgrade (7→8→9) - direct upgrade not supported
**Total Timeline**: 3-4 weeks (including stabilization periods)
**Total Downtime**: ~4 hours (2 hours per phase)
## Infrastructure Overview
**Production Services** (8 LXC + 17 VMs):
- **Critical**: Paper Dynasty/Major Domo (VMs 115, 110), Gitea (LXC 225), n8n (LXC 210), Home Assistant (VM 109)
- **Important**: Media services (Plex 107, Tdarr 113, arr-stack 221), OpenClaw (224), Databases (112)
- **Lower Priority**: Game servers, development containers
**Key Constraints**:
- Home Assistant VM 109 requires dual network (vmbr1 for Matter support)
- All production Discord bots must minimize downtime
- Gitea mirrored to GitHub provides backup
- TrueNAS backup mount at 10.10.0.35
---
## Phase 1: Proxmox 7.1 → 8.4 Upgrade
### Pre-Upgrade Preparation (1-2 days)
#### 1. Comprehensive Backups
**Priority 1 - Production Services**:
```bash
# Backup critical services to TrueNAS
vzdump 210 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # n8n
vzdump 115 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # docker-sba
vzdump 112 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # databases
vzdump 110 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # discord-bots
vzdump 225 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # gitea
vzdump 109 --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd # homeassistant
```
**Priority 2 - All Remaining VMs/LXCs**:
```bash
vzdump --all --mode snapshot --dumpdir /mnt/truenas/proxmox --compress zstd
```
**Backup Proxmox Configuration**:
```bash
tar -czf /mnt/truenas/proxmox/pve-config-$(date +%Y%m%d).tar.gz /etc/pve/
cp /etc/network/interfaces /mnt/truenas/proxmox/interfaces.backup
```
**Expected**: 2-4 hours, ~500GB-1TB storage required
#### 2. Pre-Upgrade Validation
```bash
# Run Proxmox 7-to-8 checker
pve7to8 --full
# Update to latest PVE 7.4
apt update && apt dist-upgrade -y
# Verify minimum version
pveversion # Must show 7.4-15 or higher
# Document current state
pvesh get /cluster/resources --type vm --output-format yaml > /mnt/truenas/proxmox/vm-inventory-pre-upgrade.yaml
```
#### 3. Maintenance Window Planning
**Recommended Timing**: Overnight or early morning weekend
**Estimated Downtime**: 1.5-2.5 hours
**Notifications Required**: Discord bot users, game server players
### Upgrade Execution (2-4 hours including downtime)
#### 1. Update to Latest PVE 7.4
```bash
apt update && apt dist-upgrade -y
pveversion # Verify 7.4-XX
reboot
```
#### 2. Configure PVE 8 Repositories
```bash
# Backup current config
cp /etc/apt/sources.list /etc/apt/sources.list.pve7-backup
cp -a /etc/apt/sources.list.d/ /etc/apt/sources.list.d.pve7-backup/
# Update repositories (Bullseye → Bookworm)
sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list
sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null || true
apt update
```
#### 3. Execute Distribution Upgrade
```bash
apt dist-upgrade
# Duration: 15-45 minutes
# Accept new versions of /etc/issue
# Keep current versions of customized configs
reboot
```
#### 4. Verify PVE 8 Installation
```bash
pveversion # Should show pve-manager/8.4-X
uname -r # Should show 6.8.X-X-pve
# Verify services
systemctl status pve-cluster pvedaemon pveproxy pvestatd
pvesm status
```
### Post-Upgrade Validation
**Start Services in Dependency Order**:
```bash
# Databases first
pvesh create /nodes/proxmox/qemu/112/status/start
# Infrastructure
pvesh create /nodes/proxmox/lxc/225/status/start # gitea
pvesh create /nodes/proxmox/lxc/210/status/start # n8n
# Applications
pvesh create /nodes/proxmox/qemu/115/status/start # docker-sba (Paper Dynasty)
pvesh create /nodes/proxmox/qemu/110/status/start # discord-bots
pvesh create /nodes/proxmox/lxc/224/status/start # openclaw
# Media & Others
pvesh create /nodes/proxmox/qemu/109/status/start # homeassistant
pvesh create /nodes/proxmox/qemu/107/status/start # plex
pvesh create /nodes/proxmox/lxc/221/status/start # arr-stack
```
**Service Validation Checklist**:
- [ ] Discord bots responding in Discord
- [ ] Database connections working
- [ ] n8n workflows executing
- [ ] Gitea accessible at git.manticorum.com
- [ ] Home Assistant automations running
- [ ] Media servers streaming (Plex/Jellyfin)
- [ ] Web UI accessible and functional
### Stabilization Period
**Wait 1-2 weeks before PVE 9 upgrade**
Monitor for:
- VM/LXC stability
- Performance issues
- Service uptime
- Error logs
---
## Phase 2: Proxmox 8.4 → 9.1 Upgrade
### Pre-Upgrade Preparation (1 day)
#### 1. LXC Compatibility Check (CRITICAL)
```bash
# Verify systemd version in each LXC (must be > 230)
for ct in 108 210 211 221 222 223 224 225; do
echo "=== LXC $ct ==="
pct exec $ct -- systemd --version | head -1
done
```
**Action Required**: If any LXC shows systemd < 230:
```bash
pct enter <CTID>
apt update && apt dist-upgrade -y
do-release-upgrade # Upgrade Ubuntu to compatible version
```
**Expected**: All Ubuntu 20.04+ LXCs should be compatible (systemd 245+)
#### 2. Fresh Backup Set
```bash
vzdump --all --mode snapshot --dumpdir /mnt/truenas/proxmox/pve9-upgrade --compress zstd
tar -czf /mnt/truenas/proxmox/pve8-config-$(date +%Y%m%d).tar.gz /etc/pve/
```
#### 3. Run PVE 8-to-9 Checker
```bash
pve8to9 --full
```
### Upgrade Execution (2-4 hours including downtime)
#### 1. Configure PVE 9 Repositories
```bash
# Backup PVE 8 config
cp /etc/apt/sources.list /etc/apt/sources.list.pve8-backup
cp -a /etc/apt/sources.list.d/ /etc/apt/sources.list.d.pve8-backup/
# Update repositories (Bookworm → Trixie)
sed -i 's/bookworm/trixie/g' /etc/apt/sources.list
echo "deb http://download.proxmox.com/debian/pve trixie pve-no-subscription" > /etc/apt/sources.list.d/pve-install-repo.list
sed -i 's/^deb/# deb/' /etc/apt/sources.list.d/pve-enterprise.list 2>/dev/null || true
apt update
```
#### 2. Execute Distribution Upgrade
```bash
apt dist-upgrade
# Duration: 20-60 minutes
reboot
```
#### 3. Verify PVE 9 Installation
```bash
pveversion # Should show pve-manager/9.1-X
uname -r # Should show 6.14.X-X-pve
# Verify cgroupv2 (PVE 9 requirement)
mount | grep cgroup2
# Verify services
systemctl status pve-cluster pvedaemon pveproxy pvestatd
pvesm status
```
### Post-Upgrade Validation
**Start and validate services** using same procedure as PVE 8 upgrade.
**Additional PVE 9 Checks**:
- Web UI with cleared browser cache (Ctrl+Shift+R)
- Memory reporting (PVE 9 includes overhead in VM memory)
- Storage performance validation
---
## Rollback Procedures
### If PVE 8 Upgrade Fails
**During dist-upgrade**:
```bash
apt --fix-broken install
dpkg --configure -a
# If unrecoverable:
cp /etc/apt/sources.list.pve7-backup /etc/apt/sources.list
cp -a /etc/apt/sources.list.d.pve7-backup/* /etc/apt/sources.list.d/
apt update && apt install pve-manager/7.4
```
**After reboot to unstable system**:
- Boot to previous kernel via GRUB → Advanced options
- Rollback repositories as above
### If PVE 9 Upgrade Fails
```bash
cp /etc/apt/sources.list.pve8-backup /etc/apt/sources.list
cp -a /etc/apt/sources.list.d.pve8-backup/* /etc/apt/sources.list.d/
apt update && apt dist-upgrade
reboot
```
### If VM/LXC Won't Start
**Restore from backup**:
```bash
# LXC
pct restore <CTID> /mnt/truenas/proxmox/vzdump-lxc-<CTID>-*.tar.zst --storage local-lvm
# VM
qmrestore /mnt/truenas/proxmox/vzdump-qemu-<VMID>-*.vma.zst <VMID>
```
### Complete Reinstallation (Last Resort)
1. Reinstall Proxmox VE 9 from ISO
2. Restore configs from `/mnt/truenas/proxmox/pve-config-*/`
3. Restore VMs/LXCs from backups
4. Reconfigure networking if needed
---
## Risk Assessment
| Component | Risk | Impact | Mitigation |
|-----------|------|--------|-----------|
| Production Bots (115, 110) | HIGH | Service downtime | Backup instance ready, notify users |
| Databases (112) | HIGH | Data loss | Multiple backups, test restore |
| LXC systemd compatibility | MEDIUM | Container won't start | Pre-verify versions, upgrade OS if needed |
| Network config | MEDIUM | Connectivity loss | Document config, console access |
| n8n workflows (210) | MEDIUM | Automation failures | Export workflow configs |
**Low Risk**: Game servers, templates, unused services
---
## Post-Upgrade Tasks
### 1. Update Documentation
- Record upgrade completion in `/mnt/NV2/Development/claude-home/vm-management/`
- Update Proxmox version references
- Document issues encountered
### 2. Performance Validation
```bash
pvesh get /cluster/resources
```
### 3. Long-Term Monitoring
- Daily health checks
- Resource utilization trends
- Plan next upgrade (PVE 9.x updates)
---
## Timeline Summary
| Phase | Duration | Downtime | Activity |
|-------|----------|----------|----------|
| Pre-PVE8 Prep | 1-2 days | None | Backups, validation |
| PVE 7→8 Upgrade | 2-4 hours | 1.5-2.5 hours | Repository update, upgrade |
| PVE 8 Stabilization | 1-2 weeks | None | Monitor, validate |
| Pre-PVE9 Prep | 1 day | None | LXC validation, backups |
| PVE 8→9 Upgrade | 2-4 hours | 1.5-2.5 hours | Repository update, upgrade |
| Post-Upgrade | 1-2 days | None | Documentation, optimization |
| **TOTAL** | **3-4 weeks** | **~4 hours** | Full upgrade with stabilization |
---
## Critical Files
- `/etc/pve/qemu-server/*.conf` - VM configurations (backup critical)
- `/etc/pve/lxc/*.conf` - LXC configurations (backup critical)
- `/etc/network/interfaces` - Network config (document before changes)
- `/etc/apt/sources.list` - Repository config (will be modified)
- `/etc/apt/sources.list.d/pve-*.list` - Proxmox repos (will be modified)
---
## Verification Checklist
After each upgrade phase:
- [ ] Proxmox version correct (`pveversion`)
- [ ] Kernel version updated (`uname -r`)
- [ ] All services running (`systemctl status pve-*`)
- [ ] Storage accessible (`pvesm status`)
- [ ] Network functional (`ip addr`, `ip route`)
- [ ] All VMs/LXCs visible in UI
- [ ] Critical VMs/LXCs started successfully
- [ ] Discord bots responding
- [ ] Databases accessible
- [ ] n8n workflows running
- [ ] Gitea accessible
- [ ] Home Assistant functional
- [ ] Media streaming working
- [ ] Web UI functional (clear cache first)
---
## Sources
- [Proxmox VE: Upgrade from 7 to 8](https://pve.proxmox.com/wiki/Upgrade_from_7_to_8)
- [Proxmox VE: Upgrade from 8 to 9](https://pve.proxmox.com/wiki/Upgrade_from_8_to_9)
- [Proxmox VE: Backup and Restore](https://pve.proxmox.com/wiki/Backup_and_Restore)