docs: sync KB — 2026-04-08-home-network-review.md,2026-04-08-home-network-review-design.md

This commit is contained in:
Cal Corum 2026-04-08 18:00:29 -05:00
parent a307e4dcb7
commit 8d165efbe6
2 changed files with 1618 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,297 @@
# Home Network Review — Design Spec
**Date:** 2026-04-08
**Approach:** Hybrid Layer-by-Layer (discover-then-fix per layer, bottom-up)
**Execution model:** Sub-agent driven — parallel agents within each layer's discovery/analysis phases, sequential remediation
## Context
### Current Infrastructure
- **Router/Gateway:** UniFi UDM Pro
- **Switch:** US-24-PoE (250W)
- **Access Points:** 3x UAP-AC-Lite (Office, First Floor, Upper Floor)
- **Hypervisor:** Proxmox at `10.10.0.10`
- **Physical server:** ubuntu-manticore (`10.10.0.226`) — Pi-hole, Jellyfin, Tdarr, KB RAG stack
- **VM 115:** docker-sba (`10.10.0.88`) — Paper Dynasty, SBA services
- **NAS:** TrueNAS at `10.10.0.35`
- **Reverse proxy:** Nginx Proxy Manager — external access via `*.manticorum.com`
- **DNS:** Dual Pi-hole HA — primary `10.10.0.16` (npm-pihole LXC), secondary `10.10.0.226` (manticore), synced via Orbital Sync + NPM DNS sync cron
### Current Network Topology
| Network | Subnet | Purpose |
|---------|--------|---------|
| Home | `10.0.0.0/23` | Personal devices |
| Lab | `10.10.0.0/24` | Homelab infrastructure |
### Known Issues & Goals (Priority Order)
1. **Performance (C):** Roku on Upper Floor AP has 6 Mbps Rx rate despite -44 dBm signal. 1x1 MIMO, AP/Client Signal Balance: Poor. Likely AP TX power asymmetry with weak client radio.
2. **Cleanup (D):** Handful of custom firewall rules, need sanity check. Internal `.homelab.local` domain may not be functional — `.local` conflicts with mDNS (RFC 6762).
3. **Security (A):** Many services exposed via `*.manticorum.com` through NPM. Need WAN exposure audit.
4. **Reliability (B):** Validate Pi-hole HA failover, identify single points of failure.
5. **Expansion (E):** Add guest WiFi, expand Tailscale to full mesh, build smart home foundation.
### Additional Requirements
- **Guest WiFi:** New VLAN, isolated, internet-only
- **Tailscale:** Currently on phones with exit nodes on both networks. Goal: universal reachability — all devices can reach each other whether on home/lab network, cellular, or cloud
- **Smart Home:** Home Assistant antenna installed, not migrated. Previous Matter/HomeKit attempts failed. Want solid network foundation (IoT VLAN, mDNS) before going deeper
- **IoT VLAN:** Default-deny internet access. Per-device exceptions if needed.
## Design
### Agent Assignments
| Layer | Lead Agent(s) | Support |
|-------|---------------|---------|
| 1. WiFi & Physical | `network-engineer` | |
| 2. Network Architecture | `network-engineer` | `it-ops-orchestrator` |
| 3. DNS | `network-engineer` | |
| 4. Firewall & Security | `security-engineer`, `security-auditor` | |
| 5. Overlay & Remote Access | `network-engineer` | |
| 6. Smart Home Foundation | `iot-engineer` | `network-engineer` |
| Final Pass | `security-auditor` | `pentester` |
### Per-Layer Workflow
Each layer follows the same three-phase cycle:
1. **Discover** — export configs, scan current state, document baseline (parallel sub-agents)
2. **Analyze** — review findings, identify issues, produce recommendations (parallel sub-agents)
3. **Remediate** — implement changes, validate, document new state (sequential)
---
### Layer 1: WiFi & Physical
**Goal:** Optimize wireless performance, diagnose Roku issue, establish baseline RF environment.
**Discovery (parallel):**
- Export AP configs from UniFi (channels, power levels, band steering, DTIM, minimum RSSI)
- Pull client device list with signal/rate/retry stats
- Document AP placement (floor, room, mounting)
- Check for channel conflicts — 3 APs on 5GHz 80MHz channels could overlap
**Analysis (parallel):**
- Evaluate channel plan — non-overlapping channels? DFS channels available?
- Review AP power levels — high TX power on AC Lites causes asymmetry with weak client radios
- Assess band steering config — is 2.4GHz available as fallback?
- Roku-specific: determine if lowering AP-Upper Floor TX power or moving Roku to 2.4GHz improves Rx rate
**Remediation (sequential):**
- Apply optimized channel plan
- Adjust TX power levels per AP
- Configure minimum RSSI thresholds if not set
- Validate Roku improvement
- Document new baseline
**Key insight:** The Roku's 1x1 radio with 6 Mbps Rx rate at -44 dBm signal strongly suggests AP TX power is too high relative to what the Roku can transmit back. Lowering AP power or moving to 2.4GHz are the likely fixes.
---
### Layer 2: Network Architecture
**Goal:** Expand from 2 VLANs to 4, supporting guest WiFi and IoT isolation.
**Target VLAN layout:**
| VLAN | Name | Subnet | Purpose |
|------|------|--------|---------|
| Existing | Home | `10.0.0.0/23` | Trusted personal devices |
| Existing | Lab | `10.10.0.0/24` | Homelab servers, Proxmox, infrastructure |
| New | Guest | TBD (e.g., `10.20.0.0/24`) | Guest WiFi — internet only, no local access |
| New | IoT | TBD (e.g., `10.30.0.0/24`) | Smart devices — no internet by default |
**Discovery (parallel):**
- Export current VLAN config (VLAN IDs, DHCP scopes, assignments)
- Inventory all devices and current network placement
- Document inter-VLAN routing rules
- Check switch port VLAN assignments (tagged/untagged)
**Analysis (parallel):**
- Determine which devices move to IoT VLAN (Roku, smart bulbs, switches, HA hub)
- Design DHCP scopes for new VLANs
- Plan inter-VLAN access: IoT reaches HA only, HA reaches into IoT, no IoT internet
- WiFi SSIDs: one per VLAN or shared SSID with VLAN assignment?
**Remediation (sequential):**
- Create Guest and IoT VLANs in UniFi
- Configure DHCP for new VLANs
- Create WiFi networks (Guest SSID, IoT SSID)
- Migrate devices to appropriate VLANs
- Validate connectivity per VLAN
- Document new topology
---
### Layer 3: DNS
**Goal:** Validate Pi-hole HA, plan mDNS for smart home, ensure DNS works across all four VLANs.
**Discovery (parallel):**
- Validate Orbital Sync (matching blocklists, custom entries on both Pi-holes)
- Check NPM DNS sync cron — is `custom.list` consistent?
- Document current DNS records in `homelab.local` zone
- Check DHCP DNS server advertisements on both existing VLANs
**Analysis (parallel):**
- Verify failover: what happens when primary (`10.10.0.16`) goes down?
- DNS per VLAN: Guest gets Pi-hole (ad blocking) but NOT internal name resolution. IoT resolves HA only.
- mDNS for smart home — Matter/HomeKit use mDNS for discovery, doesn't cross VLANs. Options:
- UniFi mDNS reflector (built-in, simple, reflects everything)
- Avahi reflector on a host (more granular)
- Explicit HA configuration for IoT VLAN discovery
- Check if iOS DNS bypass issue (from KB) is still relevant
**Remediation (sequential):**
- Configure DNS for Guest and IoT VLANs
- Set up mDNS reflection (method TBD)
- Fix any Orbital Sync or failover gaps
- Validate DNS resolution from each VLAN
- Document DNS architecture
---
### Layer 4: Firewall & Security
**Goal:** Clean up rules, audit WAN exposure, validate internal domain, harden perimeter.
**Discovery (parallel):**
- Export all UniFi firewall rules (WAN/LAN/Guest, in/out/local)
- Inventory all NPM proxy hosts — which services exposed on `*.manticorum.com`
- Test internal domain resolution: does `.homelab.local` work from each network?
- Check NPM SSL cert status and auto-renewal
- Document port forwards on UDM Pro
- Check UDM Pro WAN-facing services (remote management, STUN, UPnP)
**Analysis (parallel):**
- **Firewall rule audit:** Redundant, conflicting, or overly broad rules? Missing rules (e.g., IoT→Lab block)?
- **NPM exposure review:** Per proxy host — does it need to be internet-facing? Auth configured? Security headers (HSTS, X-Frame-Options, CSP)?
- **Internal domain strategy:** `.local` conflicts with mDNS. Options:
- Keep `.homelab.local` with Pi-hole handling (risk of mDNS collision)
- Switch to `lab.manticorum.com` with split DNS (recommended — you own the domain, no mDNS conflict, clean)
- Use `.home.arpa` (RFC 8375, purpose-built for home networks)
- **Inter-VLAN rules:** Guest = internet-only. IoT = no internet, HA access only. Lab = reachable from Home, not from Guest/IoT.
- **WAN hardening:** UPnP status, unnecessary exposure
**Remediation (sequential):**
- Remove/consolidate stale firewall rules
- Harden NPM proxy hosts (auth, headers, prune unnecessary exposure)
- Implement chosen internal domain strategy (recommendation: `lab.manticorum.com` split DNS)
- Create inter-VLAN firewall rules for Guest and IoT
- Disable UPnP if enabled, close unnecessary WAN exposure
- External port scan validation
- Document final ruleset and NPM inventory
---
### Layer 5: Overlay & Remote Access
**Goal:** Tailscale full mesh — universal reachability across home, cellular, and cloud.
**Discovery (parallel):**
- Document current Tailscale setup (devices, exit nodes, ACL policy)
- Check for subnet router usage vs exit-node-only
- Identify all devices for the mesh (workstation, phones, laptops, servers, cloud VMs)
- Check if OpenVPN is active or legacy
**Analysis (parallel):**
- **Architecture options:**
- Subnet routers: Tailscale on 1-2 hosts advertising home + lab subnets. Simpler, fewer installs.
- Full mesh: Tailscale on every server. Direct reachability, no SPOF, more to manage.
- Hybrid (recommended): Tailscale on key servers + subnet router for the rest.
- **DNS integration:** Tailscale MagicDNS vs Pi-hole coexistence
- **ACL policy:** Which devices reach which? Phones get everything? Cloud VMs lab-only?
- **Exit node strategy:** Keep current phone exit nodes? Add workstation?
- **OpenVPN decommission:** If Tailscale covers all use cases, remove it
**Remediation (sequential):**
- Install/configure Tailscale on chosen devices
- Set up subnet routes or direct mesh
- Configure Tailscale ACLs
- Integrate DNS (MagicDNS + Pi-hole)
- Test: home→cloud, cellular→lab, cloud→home
- Decommission OpenVPN if replaced
- Document mesh topology and ACLs
---
### Layer 6: Smart Home Foundation
**Goal:** IoT VLAN ready (from Layer 2), Home Assistant deployed, Matter/Thread infrastructure in place.
**Discovery (parallel):**
- Inventory smart devices — protocols (WiFi, Zigbee, Z-Wave, Matter, Thread)
- Document HA hardware (antenna type — Zigbee coordinator? Thread border router? SkyConnect?)
- Document previous HomeKit/Matter attempts — what failed and why
- Identify devices for HA migration
**Analysis (parallel):**
- **Protocol strategy:**
- Which devices support Matter (firmware update path)?
- WiFi-only devices → IoT VLAN, managed through HA
- Zigbee/Thread devices → HA radio, no VLAN needed
- **HA network placement:** Must reach IoT VLAN, be reachable from Home VLAN (UI), handle mDNS. Options: dedicated VM, container on manticore, dedicated hardware.
- **Matter/Thread specifics:**
- Thread border routers: same segment as HA coordinator
- Matter commissioning uses BLE + WiFi — which VLAN?
- Apple Home: HA HomeKit bridge vs replace HomeKit entirely
- **Migration path:** Phased, validate each batch
**Remediation (sequential):**
- Deploy Home Assistant (if not already running)
- Configure HA network access (IoT VLAN reach, Home VLAN UI)
- Set up Zigbee/Thread coordinator
- Migrate devices in phases
- Test Matter commissioning end-to-end
- Document device inventory, protocols, HA architecture
---
### Final Pass: Cross-Cutting Security Audit
**Goal:** Holistic review after all layers complete — catch anything missed or introduced.
**Agent:** `security-auditor` lead, `pentester` assist.
**Tasks:**
- Port scan from WAN — verify only intended services reachable
- Inter-VLAN isolation verification — Guest can't reach Lab/Home/IoT, IoT can't reach internet or Lab
- NPM proxy hosts: SSL + headers validated
- No default credentials on network gear or exposed services
- Tailscale ACLs match actual reachability
- Produce final network topology document
---
## Dependencies
```
Layer 1 (WiFi) ─────────────────────────────────────────────┐
│ │
Layer 2 (VLANs) ────────────────────────────────────────────┤
│ │
Layer 3 (DNS) ──────────────────────────────────────────────┤
│ │
Layer 4 (Firewall) ─────────────────────────────────────────┤
│ │
Layer 5 (Tailscale) ────────────────────────────────────────┤
│ │
Layer 6 (Smart Home) ───────────────────────────────────────┤
Final Pass
```
Layers are sequential — each builds on the one below. Within each layer, discovery and analysis phases run parallel sub-agents. Remediation is sequential within a layer.
## Deliverables
Per layer:
- Baseline snapshot (current state before changes)
- Changes made (with rationale)
- Validation results
- Updated documentation
Final:
- Complete network topology document
- Firewall rule inventory
- NPM proxy host inventory with security status
- Tailscale mesh diagram and ACL policy
- Smart home device inventory and protocol map
- Security audit report