All checks were successful
Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 6s
New KB doc covering LXC 304 (ansible-controller) at 10.10.0.232 with full inventory, update playbooks, snapshot rollback, and systemd timer. Updated CONTEXT.md to reference the new controller. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.8 KiB
5.8 KiB
| title | description | type | domain | tags | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ansible Controller LXC Setup | Complete setup guide for LXC 304 (ansible-controller) at 10.10.0.232 — automated OS/Docker updates with Proxmox snapshot rollback across all VMs, LXCs, and physical servers. | guide | vm-management |
|
Ansible Controller LXC Setup
Centralized Ansible controller for automated infrastructure updates with Proxmox snapshot-based rollback.
LXC Details
- VMID: 304
- Hostname: ansible-controller
- IP: 10.10.0.232
- SSH alias:
ansible-controlleroransible - OS: Ubuntu 24.04
- Resources: 2 cores, 2GB RAM, 16GB disk
- Ansible version: 2.20.4 (from PPA)
- Collections: community.general, community.docker (bundled)
- User:
calruns playbooks, SSH key at/home/cal/.ssh/homelab_rsa
Directory Layout
/opt/ansible/
├── ansible.cfg # Main config (pipelining, forks=5)
├── inventory/
│ └── hosts.yml # Full infrastructure inventory
├── playbooks/
│ ├── update-all.yml # Full cycle: snapshot → OS → Docker → health → cleanup
│ ├── os-update-only.yml # OS packages only (lighter)
│ ├── rollback.yml # Roll back any host to a snapshot
│ └── check-status.yml # Read-only health/status check
├── run-update.sh # Runner script with logging
├── roles/ # (empty, for future use)
└── logs/ # Update run logs (12-week retention)
Managed Hosts (15 total)
Proxmox Host
| Host | IP | User |
|---|---|---|
| pve-node | 10.10.0.11 | root |
VMs
| Host | IP | VMID | User | Python |
|---|---|---|---|---|
| docker-home | 10.10.0.16 | 106 | cal | 3.9 |
| discord-bots | 10.10.0.33 | 110 | cal | 3.9 |
| databases-bots | 10.10.0.42 | 112 | cal | 3.9 |
| docker-sba | 10.10.0.88 | 115 | cal | 3.9 |
| docker-home-servers | 10.10.0.124 | 116 | cal | 3.9 |
LXCs
| Host | IP | VMID | User | Python |
|---|---|---|---|---|
| docker-n8n-lxc | 10.10.0.210 | 210 | root | 3.9 |
| arr-stack | 10.10.0.221 | 221 | root | 3.9 |
| memos | 10.10.0.222 | 222 | root | 3.9 |
| foundry-lxc | 10.10.0.223 | 223 | root | 3.9 |
| gitea | 10.10.0.225 | 225 | root | 3.9 |
| uptime-kuma | 10.10.0.227 | 227 | root | 3.10 |
| claude-discord-coordinator | 10.10.0.230 | 301 | root | 3.12 |
| claude-runner | 10.10.0.148 | 302 | root | 3.12 |
Physical
| Host | IP | User | Python |
|---|---|---|---|
| ubuntu-manticore | 10.10.0.226 | cal | 3.12 |
Excluded
- Home Assistant (VM 109): self-managed via HA Supervisor
- Palworld (LXC 230): deleted 2026-03-25 (freed IP collision with LXC 301)
Usage
SSH to the controller and run as cal:
ssh ansible
export ANSIBLE_CONFIG=/opt/ansible/ansible.cfg
# Check status of everything
ansible-playbook /opt/ansible/playbooks/check-status.yml
# Full update cycle (snapshot → update → health check → cleanup)
ansible-playbook /opt/ansible/playbooks/update-all.yml
# Update specific group
ansible-playbook /opt/ansible/playbooks/update-all.yml --limit lxcs
ansible-playbook /opt/ansible/playbooks/update-all.yml --limit docker-home
# Dry run
ansible-playbook /opt/ansible/playbooks/update-all.yml --check
# OS updates only (no Docker)
ansible-playbook /opt/ansible/playbooks/os-update-only.yml
# Skip snapshots
ansible-playbook /opt/ansible/playbooks/update-all.yml -e skip_snapshot=true
# Roll back a host to latest snapshot
ansible-playbook /opt/ansible/playbooks/rollback.yml --limit gitea
# Roll back to specific snapshot
ansible-playbook /opt/ansible/playbooks/rollback.yml --limit gitea -e snapshot=pre-update-2026-03-25
Update Pipeline (update-all.yml)
- Snapshot: Creates
pre-update-YYYY-MM-DDsnapshot on each Proxmox guest viapvesh - OS Update:
apt update && apt upgrade safe && autoremove(serial: 3) - Docker Update: Finds compose files, pulls images, restarts changed stacks (serial: 1)
- Health Check: SSH ping, disk space warning (>89%), exited container report
- Snapshot Cleanup: Keeps last 3
pre-update-*snapshots per host
Scheduled Runs
Systemd timer runs every Sunday at 3:00 AM UTC with up to 10 min jitter.
Persistent=true ensures missed runs execute on next boot.
# Check timer status
ssh ansible "systemctl status ansible-update.timer"
# View last run
ssh ansible "systemctl status ansible-update.service"
# View logs
ssh ansible "ls -lt /opt/ansible/logs/ | head -5"
ssh ansible "journalctl -u ansible-update.service --no-pager -n 50"
Inventory Groups
proxmox_host— just pve-nodevms— all QEMU VMslxcs— all LXC containersphysical— bare-metal servers (manticore)docker_hosts— any host running Docker compose stacksproxmox_guests— union of vms + lxcs (snapshotable)
Adding a New Host
- Add entry to
/opt/ansible/inventory/hosts.ymlunder the appropriate group - Include:
ansible_host,ansible_user,proxmox_vmid,proxmox_type(for guests) - Set
ansible_python_interpreterif Python < 3.9 default - Ensure SSH key (
/home/cal/.ssh/homelab_rsa) is authorized on the target - For VMs: ensure NOPASSWD sudo for
caluser - Test:
ansible <hostname> -m ping
Setup Prerequisites Fixed During Initial Deployment
- Python 3.9 installed via deadsnakes PPA on all Ubuntu 20.04 hosts (Ansible 2.20 requires ≥3.9)
- NOPASSWD sudo set via
/etc/sudoers.d/calon all VMs and manticore - qemu-guest-agent enabled on VM 112 (databases-bots)
- VM 116 disk expanded from 31GB→315GB (was 100% full), DNS fixed (missing resolv.conf)
- IP collision between LXC 230 (palworld) and LXC 301 (claude-discord-coordinator) resolved by deleting palworld