VM 116: Resolve watchstate duplicate and clean up remaining containers #31

Closed
opened 2026-04-03 01:13:50 +00:00 by cal · 1 comment
Owner

Context

During the infra audit immediate fixes, we discovered VM 116 (docker-home-servers) had a watchstate container in a restart loop — Caddy (embedded in watchstate) was crashing with decoding intermediate certificate PEM: no PEM block found. This was the primary driver of VM 116's elevated load after avahi was masked.

Watchstate was stopped (not removed) and 3 dead containers were removed. VM 116 now only runs Jellyfin.

Current State (post-cleanup)

  • jellyfin — running, healthy
  • watchstate — stopped (restart loop due to corrupt Caddy PKI cert)
  • 3 dead containers removed: freetube (19 months), pihole (never started), xenodochial_agnesi (4 years)

Key Question: Is this watchstate instance needed?

Manticore also runs a watchstate container that is healthy and active. This VM 116 instance may be a stale duplicate from before services were migrated to manticore.

Tasks

  • Confirm manticore's watchstate is the canonical instance: ssh manticore "docker logs --tail 20 watchstate" — verify it's syncing Jellyfin state
  • If manticore's instance is authoritative, remove the VM 116 watchstate container and its volumes: docker rm watchstate && docker volume prune
  • If VM 116's instance was needed, fix the Caddy cert: likely delete /data/caddy/pki/ inside the container volume and restart
  • Clean up unused Docker images on VM 116: docker image prune -a -f
  • Reassess VM 116's purpose — if only Jellyfin remains, consider whether Jellyfin should move to manticore (which already runs it) and VM 116 could be decommissioned entirely
  • VM 110 (discord-bots) is also now empty after container cleanup — another decommission candidate
  • Both VMs together represent 8 vCPUs + 16 GB RAM that could be reclaimed

Labels

infra-audit, proxmox

## Context During the infra audit immediate fixes, we discovered VM 116 (docker-home-servers) had a **watchstate container in a restart loop** — Caddy (embedded in watchstate) was crashing with `decoding intermediate certificate PEM: no PEM block found`. This was the primary driver of VM 116's elevated load after avahi was masked. Watchstate was stopped (not removed) and 3 dead containers were removed. VM 116 now only runs Jellyfin. ## Current State (post-cleanup) - **jellyfin** — running, healthy - **watchstate** — stopped (restart loop due to corrupt Caddy PKI cert) - 3 dead containers removed: `freetube` (19 months), `pihole` (never started), `xenodochial_agnesi` (4 years) ## Key Question: Is this watchstate instance needed? Manticore also runs a watchstate container that is healthy and active. This VM 116 instance may be a stale duplicate from before services were migrated to manticore. ## Tasks - [ ] Confirm manticore's watchstate is the canonical instance: `ssh manticore "docker logs --tail 20 watchstate"` — verify it's syncing Jellyfin state - [ ] If manticore's instance is authoritative, **remove** the VM 116 watchstate container and its volumes: `docker rm watchstate && docker volume prune` - [ ] If VM 116's instance was needed, fix the Caddy cert: likely delete `/data/caddy/pki/` inside the container volume and restart - [ ] Clean up unused Docker images on VM 116: `docker image prune -a -f` - [ ] Reassess VM 116's purpose — if only Jellyfin remains, consider whether Jellyfin should move to manticore (which already runs it) and VM 116 could be decommissioned entirely ## Related - VM 110 (discord-bots) is also now empty after container cleanup — another decommission candidate - Both VMs together represent 8 vCPUs + 16 GB RAM that could be reclaimed ## Labels `infra-audit`, `proxmox`
cal added the
infra-audit
proxmox
labels 2026-04-03 01:14:06 +00:00
Claude added the
ai-working
label 2026-04-03 16:00:54 +00:00
Claude removed the
ai-working
label 2026-04-03 16:04:56 +00:00
Collaborator

Completed infrastructure cleanup and opened PR #41.

What was done:

  • Confirmed manticore's watchstate (v1.0.2) is the canonical instance — actively syncing Jellyfin state
  • Removed the stopped/broken watchstate container from VM 116 (stale duplicate with corrupt Caddy PKI cert)
  • Pruned 5 orphan images (watchstate, freetube, pihole, hello-world) → 3.36 GB reclaimed
  • VM 116 now runs only Jellyfin (container up 8 days, healthy)

Docs updated: server-configs/hosts.yml (added VM 116 as decommission candidate) and vm-management/proxmox-upgrades/proxmox-7-to-9-upgrade-plan.md (status updated from Stopped/Investigate → Decommission Candidate).

Next: VM 116 full decommission (and VM 110) can be separate issues following the decommission runbook.

Completed infrastructure cleanup and opened PR #41. **What was done:** - Confirmed manticore's watchstate (v1.0.2) is the canonical instance — actively syncing Jellyfin state - Removed the stopped/broken `watchstate` container from VM 116 (stale duplicate with corrupt Caddy PKI cert) - Pruned 5 orphan images (watchstate, freetube, pihole, hello-world) → **3.36 GB reclaimed** - VM 116 now runs only Jellyfin (container up 8 days, healthy) **Docs updated:** `server-configs/hosts.yml` (added VM 116 as decommission candidate) and `vm-management/proxmox-upgrades/proxmox-7-to-9-upgrade-plan.md` (status updated from Stopped/Investigate → Decommission Candidate). Next: VM 116 full decommission (and VM 110) can be separate issues following the decommission runbook.
Claude added the
ai-pr-opened
label 2026-04-03 16:05:04 +00:00
cal closed this issue 2026-04-03 20:01:28 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cal/claude-home#31
No description provided.