claude-home/vm-management/ansible-controller-setup.md
Cal Corum 93d6093d45
All checks were successful
Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 6s
docs: add Ansible controller LXC setup guide and update VM context
New KB doc covering LXC 304 (ansible-controller) at 10.10.0.232 with
full inventory, update playbooks, snapshot rollback, and systemd timer.
Updated CONTEXT.md to reference the new controller.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:26:55 -05:00

5.8 KiB

title description type domain tags
Ansible Controller LXC Setup Complete setup guide for LXC 304 (ansible-controller) at 10.10.0.232 — automated OS/Docker updates with Proxmox snapshot rollback across all VMs, LXCs, and physical servers. guide vm-management
ansible
proxmox
lxc
automation
updates
snapshots
rollback
systemd

Ansible Controller LXC Setup

Centralized Ansible controller for automated infrastructure updates with Proxmox snapshot-based rollback.

LXC Details

  • VMID: 304
  • Hostname: ansible-controller
  • IP: 10.10.0.232
  • SSH alias: ansible-controller or ansible
  • OS: Ubuntu 24.04
  • Resources: 2 cores, 2GB RAM, 16GB disk
  • Ansible version: 2.20.4 (from PPA)
  • Collections: community.general, community.docker (bundled)
  • User: cal runs playbooks, SSH key at /home/cal/.ssh/homelab_rsa

Directory Layout

/opt/ansible/
├── ansible.cfg              # Main config (pipelining, forks=5)
├── inventory/
│   └── hosts.yml            # Full infrastructure inventory
├── playbooks/
│   ├── update-all.yml       # Full cycle: snapshot → OS → Docker → health → cleanup
│   ├── os-update-only.yml   # OS packages only (lighter)
│   ├── rollback.yml         # Roll back any host to a snapshot
│   └── check-status.yml     # Read-only health/status check
├── run-update.sh            # Runner script with logging
├── roles/                   # (empty, for future use)
└── logs/                    # Update run logs (12-week retention)

Managed Hosts (15 total)

Proxmox Host

Host IP User
pve-node 10.10.0.11 root

VMs

Host IP VMID User Python
docker-home 10.10.0.16 106 cal 3.9
discord-bots 10.10.0.33 110 cal 3.9
databases-bots 10.10.0.42 112 cal 3.9
docker-sba 10.10.0.88 115 cal 3.9
docker-home-servers 10.10.0.124 116 cal 3.9

LXCs

Host IP VMID User Python
docker-n8n-lxc 10.10.0.210 210 root 3.9
arr-stack 10.10.0.221 221 root 3.9
memos 10.10.0.222 222 root 3.9
foundry-lxc 10.10.0.223 223 root 3.9
gitea 10.10.0.225 225 root 3.9
uptime-kuma 10.10.0.227 227 root 3.10
claude-discord-coordinator 10.10.0.230 301 root 3.12
claude-runner 10.10.0.148 302 root 3.12

Physical

Host IP User Python
ubuntu-manticore 10.10.0.226 cal 3.12

Excluded

  • Home Assistant (VM 109): self-managed via HA Supervisor
  • Palworld (LXC 230): deleted 2026-03-25 (freed IP collision with LXC 301)

Usage

SSH to the controller and run as cal:

ssh ansible
export ANSIBLE_CONFIG=/opt/ansible/ansible.cfg

# Check status of everything
ansible-playbook /opt/ansible/playbooks/check-status.yml

# Full update cycle (snapshot → update → health check → cleanup)
ansible-playbook /opt/ansible/playbooks/update-all.yml

# Update specific group
ansible-playbook /opt/ansible/playbooks/update-all.yml --limit lxcs
ansible-playbook /opt/ansible/playbooks/update-all.yml --limit docker-home

# Dry run
ansible-playbook /opt/ansible/playbooks/update-all.yml --check

# OS updates only (no Docker)
ansible-playbook /opt/ansible/playbooks/os-update-only.yml

# Skip snapshots
ansible-playbook /opt/ansible/playbooks/update-all.yml -e skip_snapshot=true

# Roll back a host to latest snapshot
ansible-playbook /opt/ansible/playbooks/rollback.yml --limit gitea

# Roll back to specific snapshot
ansible-playbook /opt/ansible/playbooks/rollback.yml --limit gitea -e snapshot=pre-update-2026-03-25

Update Pipeline (update-all.yml)

  1. Snapshot: Creates pre-update-YYYY-MM-DD snapshot on each Proxmox guest via pvesh
  2. OS Update: apt update && apt upgrade safe && autoremove (serial: 3)
  3. Docker Update: Finds compose files, pulls images, restarts changed stacks (serial: 1)
  4. Health Check: SSH ping, disk space warning (>89%), exited container report
  5. Snapshot Cleanup: Keeps last 3 pre-update-* snapshots per host

Scheduled Runs

Systemd timer runs every Sunday at 3:00 AM UTC with up to 10 min jitter. Persistent=true ensures missed runs execute on next boot.

# Check timer status
ssh ansible "systemctl status ansible-update.timer"

# View last run
ssh ansible "systemctl status ansible-update.service"

# View logs
ssh ansible "ls -lt /opt/ansible/logs/ | head -5"
ssh ansible "journalctl -u ansible-update.service --no-pager -n 50"

Inventory Groups

  • proxmox_host — just pve-node
  • vms — all QEMU VMs
  • lxcs — all LXC containers
  • physical — bare-metal servers (manticore)
  • docker_hosts — any host running Docker compose stacks
  • proxmox_guests — union of vms + lxcs (snapshotable)

Adding a New Host

  1. Add entry to /opt/ansible/inventory/hosts.yml under the appropriate group
  2. Include: ansible_host, ansible_user, proxmox_vmid, proxmox_type (for guests)
  3. Set ansible_python_interpreter if Python < 3.9 default
  4. Ensure SSH key (/home/cal/.ssh/homelab_rsa) is authorized on the target
  5. For VMs: ensure NOPASSWD sudo for cal user
  6. Test: ansible <hostname> -m ping

Setup Prerequisites Fixed During Initial Deployment

  • Python 3.9 installed via deadsnakes PPA on all Ubuntu 20.04 hosts (Ansible 2.20 requires ≥3.9)
  • NOPASSWD sudo set via /etc/sudoers.d/cal on all VMs and manticore
  • qemu-guest-agent enabled on VM 112 (databases-bots)
  • VM 116 disk expanded from 31GB→315GB (was 100% full), DNS fixed (missing resolv.conf)
  • IP collision between LXC 230 (palworld) and LXC 301 (claude-discord-coordinator) resolved by deleting palworld