Right-size VM 115 (docker-sba): 16 → 4-8 vCPUs #18

Closed
opened 2026-04-03 01:07:56 +00:00 by cal · 1 comment
Owner

Context

Infrastructure audit (2026-04-02) found VM 115 (docker-sba) has 16 vCPUs allocated but actual load is 0.06/core — massively overprovisioned. This contributes to a 2.03:1 vCPU overcommit ratio on the Proxmox host.

Current State

  • VM: 115 (docker-sba) at 10.10.0.88
  • Allocated: 16 vCPUs (2 sockets × 8 cores), 8 GB RAM
  • Actual load: 0.06/core (5m avg)
  • Services: Paper Dynasty discord app, Paper Dynasty DB, SBA website, SBA Ghost blog

Tasks

  • Check Uptime Kuma response time trends for services on this VM over the past week — look for burst periods where 16 vCPUs would matter
  • Check if any batch jobs (builds, DB imports) run on a schedule that would spike CPU
  • Shut down VM (schedule a maintenance window — services will be briefly unavailable)
  • Reduce vCPUs from 16 to 8 (conservative first step per SRE review; can go to 4 later if 8 is underutilized)
  • Reduce sockets from 2 to 1 (avoid NUMA topology overhead on a small workload)
  • Start VM and verify all Docker containers come up: docker ps
  • Verify services are responsive (Paper Dynasty bot, SBA website, Ghost)
  • Run targeted audit to confirm: homelab-audit.sh --hosts vm-115 (once --hosts flag is implemented)

SRE Notes

  • vCPU reduction requires a VM shutdown (hot-unplug not supported on most guest kernels)
  • 4 vCPUs might be fine but 8 is a safer first step if there's any doubt about burst workloads
  • Dropping from 16 to 8 alone cuts overcommit ratio from 2.03:1 to ~1.78:1

Labels

infra-audit, proxmox

## Context Infrastructure audit (2026-04-02) found VM 115 (docker-sba) has **16 vCPUs allocated** but actual load is **0.06/core** — massively overprovisioned. This contributes to a 2.03:1 vCPU overcommit ratio on the Proxmox host. ## Current State - **VM**: 115 (docker-sba) at 10.10.0.88 - **Allocated**: 16 vCPUs (2 sockets × 8 cores), 8 GB RAM - **Actual load**: 0.06/core (5m avg) - **Services**: Paper Dynasty discord app, Paper Dynasty DB, SBA website, SBA Ghost blog ## Tasks - [ ] Check Uptime Kuma response time trends for services on this VM over the past week — look for burst periods where 16 vCPUs would matter - [ ] Check if any batch jobs (builds, DB imports) run on a schedule that would spike CPU - [ ] Shut down VM (schedule a maintenance window — services will be briefly unavailable) - [ ] Reduce vCPUs from 16 to **8** (conservative first step per SRE review; can go to 4 later if 8 is underutilized) - [ ] Reduce sockets from 2 to 1 (avoid NUMA topology overhead on a small workload) - [ ] Start VM and verify all Docker containers come up: `docker ps` - [ ] Verify services are responsive (Paper Dynasty bot, SBA website, Ghost) - [ ] Run targeted audit to confirm: `homelab-audit.sh --hosts vm-115` (once `--hosts` flag is implemented) ## SRE Notes - vCPU reduction requires a VM shutdown (hot-unplug not supported on most guest kernels) - 4 vCPUs might be fine but 8 is a safer first step if there's any doubt about burst workloads - Dropping from 16 to 8 alone cuts overcommit ratio from 2.03:1 to ~1.78:1 ## Labels `infra-audit`, `proxmox`
cal added the
infra-audit
proxmox
labels 2026-04-03 01:10:14 +00:00
Author
Owner

Operational Next Steps

Code changes are ready on branch enhancement/18-rightsize-vm115-vcpus (config update + --hosts flag for targeted audits). The following operational steps require a maintenance window:

Pre-change

ssh sba-bots "docker ps"   # confirm current container state

Apply change

ssh proxmox "qm snapshot 115 pre-rightsize --description 'Before vCPU reduction 16→8'"
ssh proxmox "qm shutdown 115"       # clean ACPI shutdown (guest agent enabled)
ssh proxmox "qm set 115 --sockets 1"  # 2 sockets × 8 cores → 1 socket × 8 cores = 8 vCPUs
ssh proxmox "qm start 115"

Post-change verification

ssh sba-bots "docker ps"   # all 5 containers should be running
# Verify services respond:
#   - Paper Dynasty bot (Discord)
#   - SBA website (port 803)
#   - Ghost blog (port 2368)

# Targeted audit (once --hosts flag is merged):
homelab-audit.sh --hosts vm-115:10.10.0.88

Rollback (if needed)

ssh proxmox "qm shutdown 115"
ssh proxmox "qm rollback 115 pre-rightsize"
ssh proxmox "qm start 115"
## Operational Next Steps Code changes are ready on branch `enhancement/18-rightsize-vm115-vcpus` (config update + `--hosts` flag for targeted audits). The following operational steps require a maintenance window: ### Pre-change ```bash ssh sba-bots "docker ps" # confirm current container state ``` ### Apply change ```bash ssh proxmox "qm snapshot 115 pre-rightsize --description 'Before vCPU reduction 16→8'" ssh proxmox "qm shutdown 115" # clean ACPI shutdown (guest agent enabled) ssh proxmox "qm set 115 --sockets 1" # 2 sockets × 8 cores → 1 socket × 8 cores = 8 vCPUs ssh proxmox "qm start 115" ``` ### Post-change verification ```bash ssh sba-bots "docker ps" # all 5 containers should be running # Verify services respond: # - Paper Dynasty bot (Discord) # - SBA website (port 803) # - Ghost blog (port 2368) # Targeted audit (once --hosts flag is merged): homelab-audit.sh --hosts vm-115:10.10.0.88 ``` ### Rollback (if needed) ```bash ssh proxmox "qm shutdown 115" ssh proxmox "qm rollback 115 pre-rightsize" ssh proxmox "qm start 115" ```
cal closed this issue 2026-04-04 00:35:34 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/claude-home#18
No description provided.