All checks were successful
Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 6s
New KB doc covering LXC 304 (ansible-controller) at 10.10.0.232 with full inventory, update playbooks, snapshot rollback, and systemd timer. Updated CONTEXT.md to reference the new controller. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
163 lines
5.8 KiB
Markdown
163 lines
5.8 KiB
Markdown
---
|
|
title: "Ansible Controller LXC Setup"
|
|
description: "Complete setup guide for LXC 304 (ansible-controller) at 10.10.0.232 — automated OS/Docker updates with Proxmox snapshot rollback across all VMs, LXCs, and physical servers."
|
|
type: guide
|
|
domain: vm-management
|
|
tags: [ansible, proxmox, lxc, automation, updates, snapshots, rollback, systemd]
|
|
---
|
|
|
|
# Ansible Controller LXC Setup
|
|
|
|
Centralized Ansible controller for automated infrastructure updates with Proxmox snapshot-based rollback.
|
|
|
|
## LXC Details
|
|
|
|
- **VMID**: 304
|
|
- **Hostname**: ansible-controller
|
|
- **IP**: 10.10.0.232
|
|
- **SSH alias**: `ansible-controller` or `ansible`
|
|
- **OS**: Ubuntu 24.04
|
|
- **Resources**: 2 cores, 2GB RAM, 16GB disk
|
|
- **Ansible version**: 2.20.4 (from PPA)
|
|
- **Collections**: community.general, community.docker (bundled)
|
|
- **User**: `cal` runs playbooks, SSH key at `/home/cal/.ssh/homelab_rsa`
|
|
|
|
## Directory Layout
|
|
|
|
```
|
|
/opt/ansible/
|
|
├── ansible.cfg # Main config (pipelining, forks=5)
|
|
├── inventory/
|
|
│ └── hosts.yml # Full infrastructure inventory
|
|
├── playbooks/
|
|
│ ├── update-all.yml # Full cycle: snapshot → OS → Docker → health → cleanup
|
|
│ ├── os-update-only.yml # OS packages only (lighter)
|
|
│ ├── rollback.yml # Roll back any host to a snapshot
|
|
│ └── check-status.yml # Read-only health/status check
|
|
├── run-update.sh # Runner script with logging
|
|
├── roles/ # (empty, for future use)
|
|
└── logs/ # Update run logs (12-week retention)
|
|
```
|
|
|
|
## Managed Hosts (15 total)
|
|
|
|
### Proxmox Host
|
|
| Host | IP | User |
|
|
|------|----|------|
|
|
| pve-node | 10.10.0.11 | root |
|
|
|
|
### VMs
|
|
| Host | IP | VMID | User | Python |
|
|
|------|-----|------|------|--------|
|
|
| docker-home | 10.10.0.16 | 106 | cal | 3.9 |
|
|
| discord-bots | 10.10.0.33 | 110 | cal | 3.9 |
|
|
| databases-bots | 10.10.0.42 | 112 | cal | 3.9 |
|
|
| docker-sba | 10.10.0.88 | 115 | cal | 3.9 |
|
|
| docker-home-servers | 10.10.0.124 | 116 | cal | 3.9 |
|
|
|
|
### LXCs
|
|
| Host | IP | VMID | User | Python |
|
|
|------|-----|------|------|--------|
|
|
| docker-n8n-lxc | 10.10.0.210 | 210 | root | 3.9 |
|
|
| arr-stack | 10.10.0.221 | 221 | root | 3.9 |
|
|
| memos | 10.10.0.222 | 222 | root | 3.9 |
|
|
| foundry-lxc | 10.10.0.223 | 223 | root | 3.9 |
|
|
| gitea | 10.10.0.225 | 225 | root | 3.9 |
|
|
| uptime-kuma | 10.10.0.227 | 227 | root | 3.10 |
|
|
| claude-discord-coordinator | 10.10.0.230 | 301 | root | 3.12 |
|
|
| claude-runner | 10.10.0.148 | 302 | root | 3.12 |
|
|
|
|
### Physical
|
|
| Host | IP | User | Python |
|
|
|------|----|------|--------|
|
|
| ubuntu-manticore | 10.10.0.226 | cal | 3.12 |
|
|
|
|
### Excluded
|
|
- **Home Assistant** (VM 109): self-managed via HA Supervisor
|
|
- **Palworld** (LXC 230): deleted 2026-03-25 (freed IP collision with LXC 301)
|
|
|
|
## Usage
|
|
|
|
SSH to the controller and run as `cal`:
|
|
|
|
```bash
|
|
ssh ansible
|
|
export ANSIBLE_CONFIG=/opt/ansible/ansible.cfg
|
|
|
|
# Check status of everything
|
|
ansible-playbook /opt/ansible/playbooks/check-status.yml
|
|
|
|
# Full update cycle (snapshot → update → health check → cleanup)
|
|
ansible-playbook /opt/ansible/playbooks/update-all.yml
|
|
|
|
# Update specific group
|
|
ansible-playbook /opt/ansible/playbooks/update-all.yml --limit lxcs
|
|
ansible-playbook /opt/ansible/playbooks/update-all.yml --limit docker-home
|
|
|
|
# Dry run
|
|
ansible-playbook /opt/ansible/playbooks/update-all.yml --check
|
|
|
|
# OS updates only (no Docker)
|
|
ansible-playbook /opt/ansible/playbooks/os-update-only.yml
|
|
|
|
# Skip snapshots
|
|
ansible-playbook /opt/ansible/playbooks/update-all.yml -e skip_snapshot=true
|
|
|
|
# Roll back a host to latest snapshot
|
|
ansible-playbook /opt/ansible/playbooks/rollback.yml --limit gitea
|
|
|
|
# Roll back to specific snapshot
|
|
ansible-playbook /opt/ansible/playbooks/rollback.yml --limit gitea -e snapshot=pre-update-2026-03-25
|
|
```
|
|
|
|
## Update Pipeline (update-all.yml)
|
|
|
|
1. **Snapshot**: Creates `pre-update-YYYY-MM-DD` snapshot on each Proxmox guest via `pvesh`
|
|
2. **OS Update**: `apt update && apt upgrade safe && autoremove` (serial: 3)
|
|
3. **Docker Update**: Finds compose files, pulls images, restarts changed stacks (serial: 1)
|
|
4. **Health Check**: SSH ping, disk space warning (>89%), exited container report
|
|
5. **Snapshot Cleanup**: Keeps last 3 `pre-update-*` snapshots per host
|
|
|
|
## Scheduled Runs
|
|
|
|
Systemd timer runs every **Sunday at 3:00 AM UTC** with up to 10 min jitter.
|
|
`Persistent=true` ensures missed runs execute on next boot.
|
|
|
|
```bash
|
|
# Check timer status
|
|
ssh ansible "systemctl status ansible-update.timer"
|
|
|
|
# View last run
|
|
ssh ansible "systemctl status ansible-update.service"
|
|
|
|
# View logs
|
|
ssh ansible "ls -lt /opt/ansible/logs/ | head -5"
|
|
ssh ansible "journalctl -u ansible-update.service --no-pager -n 50"
|
|
```
|
|
|
|
## Inventory Groups
|
|
|
|
- `proxmox_host` — just pve-node
|
|
- `vms` — all QEMU VMs
|
|
- `lxcs` — all LXC containers
|
|
- `physical` — bare-metal servers (manticore)
|
|
- `docker_hosts` — any host running Docker compose stacks
|
|
- `proxmox_guests` — union of vms + lxcs (snapshotable)
|
|
|
|
## Adding a New Host
|
|
|
|
1. Add entry to `/opt/ansible/inventory/hosts.yml` under the appropriate group
|
|
2. Include: `ansible_host`, `ansible_user`, `proxmox_vmid`, `proxmox_type` (for guests)
|
|
3. Set `ansible_python_interpreter` if Python < 3.9 default
|
|
4. Ensure SSH key (`/home/cal/.ssh/homelab_rsa`) is authorized on the target
|
|
5. For VMs: ensure NOPASSWD sudo for `cal` user
|
|
6. Test: `ansible <hostname> -m ping`
|
|
|
|
## Setup Prerequisites Fixed During Initial Deployment
|
|
|
|
- **Python 3.9** installed via deadsnakes PPA on all Ubuntu 20.04 hosts (Ansible 2.20 requires ≥3.9)
|
|
- **NOPASSWD sudo** set via `/etc/sudoers.d/cal` on all VMs and manticore
|
|
- **qemu-guest-agent** enabled on VM 112 (databases-bots)
|
|
- **VM 116 disk** expanded from 31GB→315GB (was 100% full), DNS fixed (missing resolv.conf)
|
|
- **IP collision** between LXC 230 (palworld) and LXC 301 (claude-discord-coordinator) resolved by deleting palworld
|