docs: sync KB — 2026-04-08-home-network-review.md,2026-04-08-home-network-review-design.md
This commit is contained in:
parent
a307e4dcb7
commit
8d165efbe6
1321
docs/superpowers/plans/2026-04-08-home-network-review.md
Normal file
1321
docs/superpowers/plans/2026-04-08-home-network-review.md
Normal file
File diff suppressed because it is too large
Load Diff
297
docs/superpowers/specs/2026-04-08-home-network-review-design.md
Normal file
297
docs/superpowers/specs/2026-04-08-home-network-review-design.md
Normal file
@ -0,0 +1,297 @@
|
||||
# Home Network Review — Design Spec
|
||||
|
||||
**Date:** 2026-04-08
|
||||
**Approach:** Hybrid Layer-by-Layer (discover-then-fix per layer, bottom-up)
|
||||
**Execution model:** Sub-agent driven — parallel agents within each layer's discovery/analysis phases, sequential remediation
|
||||
|
||||
## Context
|
||||
|
||||
### Current Infrastructure
|
||||
- **Router/Gateway:** UniFi UDM Pro
|
||||
- **Switch:** US-24-PoE (250W)
|
||||
- **Access Points:** 3x UAP-AC-Lite (Office, First Floor, Upper Floor)
|
||||
- **Hypervisor:** Proxmox at `10.10.0.10`
|
||||
- **Physical server:** ubuntu-manticore (`10.10.0.226`) — Pi-hole, Jellyfin, Tdarr, KB RAG stack
|
||||
- **VM 115:** docker-sba (`10.10.0.88`) — Paper Dynasty, SBA services
|
||||
- **NAS:** TrueNAS at `10.10.0.35`
|
||||
- **Reverse proxy:** Nginx Proxy Manager — external access via `*.manticorum.com`
|
||||
- **DNS:** Dual Pi-hole HA — primary `10.10.0.16` (npm-pihole LXC), secondary `10.10.0.226` (manticore), synced via Orbital Sync + NPM DNS sync cron
|
||||
|
||||
### Current Network Topology
|
||||
| Network | Subnet | Purpose |
|
||||
|---------|--------|---------|
|
||||
| Home | `10.0.0.0/23` | Personal devices |
|
||||
| Lab | `10.10.0.0/24` | Homelab infrastructure |
|
||||
|
||||
### Known Issues & Goals (Priority Order)
|
||||
1. **Performance (C):** Roku on Upper Floor AP has 6 Mbps Rx rate despite -44 dBm signal. 1x1 MIMO, AP/Client Signal Balance: Poor. Likely AP TX power asymmetry with weak client radio.
|
||||
2. **Cleanup (D):** Handful of custom firewall rules, need sanity check. Internal `.homelab.local` domain may not be functional — `.local` conflicts with mDNS (RFC 6762).
|
||||
3. **Security (A):** Many services exposed via `*.manticorum.com` through NPM. Need WAN exposure audit.
|
||||
4. **Reliability (B):** Validate Pi-hole HA failover, identify single points of failure.
|
||||
5. **Expansion (E):** Add guest WiFi, expand Tailscale to full mesh, build smart home foundation.
|
||||
|
||||
### Additional Requirements
|
||||
- **Guest WiFi:** New VLAN, isolated, internet-only
|
||||
- **Tailscale:** Currently on phones with exit nodes on both networks. Goal: universal reachability — all devices can reach each other whether on home/lab network, cellular, or cloud
|
||||
- **Smart Home:** Home Assistant antenna installed, not migrated. Previous Matter/HomeKit attempts failed. Want solid network foundation (IoT VLAN, mDNS) before going deeper
|
||||
- **IoT VLAN:** Default-deny internet access. Per-device exceptions if needed.
|
||||
|
||||
## Design
|
||||
|
||||
### Agent Assignments
|
||||
|
||||
| Layer | Lead Agent(s) | Support |
|
||||
|-------|---------------|---------|
|
||||
| 1. WiFi & Physical | `network-engineer` | |
|
||||
| 2. Network Architecture | `network-engineer` | `it-ops-orchestrator` |
|
||||
| 3. DNS | `network-engineer` | |
|
||||
| 4. Firewall & Security | `security-engineer`, `security-auditor` | |
|
||||
| 5. Overlay & Remote Access | `network-engineer` | |
|
||||
| 6. Smart Home Foundation | `iot-engineer` | `network-engineer` |
|
||||
| Final Pass | `security-auditor` | `pentester` |
|
||||
|
||||
### Per-Layer Workflow
|
||||
Each layer follows the same three-phase cycle:
|
||||
1. **Discover** — export configs, scan current state, document baseline (parallel sub-agents)
|
||||
2. **Analyze** — review findings, identify issues, produce recommendations (parallel sub-agents)
|
||||
3. **Remediate** — implement changes, validate, document new state (sequential)
|
||||
|
||||
---
|
||||
|
||||
### Layer 1: WiFi & Physical
|
||||
|
||||
**Goal:** Optimize wireless performance, diagnose Roku issue, establish baseline RF environment.
|
||||
|
||||
**Discovery (parallel):**
|
||||
- Export AP configs from UniFi (channels, power levels, band steering, DTIM, minimum RSSI)
|
||||
- Pull client device list with signal/rate/retry stats
|
||||
- Document AP placement (floor, room, mounting)
|
||||
- Check for channel conflicts — 3 APs on 5GHz 80MHz channels could overlap
|
||||
|
||||
**Analysis (parallel):**
|
||||
- Evaluate channel plan — non-overlapping channels? DFS channels available?
|
||||
- Review AP power levels — high TX power on AC Lites causes asymmetry with weak client radios
|
||||
- Assess band steering config — is 2.4GHz available as fallback?
|
||||
- Roku-specific: determine if lowering AP-Upper Floor TX power or moving Roku to 2.4GHz improves Rx rate
|
||||
|
||||
**Remediation (sequential):**
|
||||
- Apply optimized channel plan
|
||||
- Adjust TX power levels per AP
|
||||
- Configure minimum RSSI thresholds if not set
|
||||
- Validate Roku improvement
|
||||
- Document new baseline
|
||||
|
||||
**Key insight:** The Roku's 1x1 radio with 6 Mbps Rx rate at -44 dBm signal strongly suggests AP TX power is too high relative to what the Roku can transmit back. Lowering AP power or moving to 2.4GHz are the likely fixes.
|
||||
|
||||
---
|
||||
|
||||
### Layer 2: Network Architecture
|
||||
|
||||
**Goal:** Expand from 2 VLANs to 4, supporting guest WiFi and IoT isolation.
|
||||
|
||||
**Target VLAN layout:**
|
||||
|
||||
| VLAN | Name | Subnet | Purpose |
|
||||
|------|------|--------|---------|
|
||||
| Existing | Home | `10.0.0.0/23` | Trusted personal devices |
|
||||
| Existing | Lab | `10.10.0.0/24` | Homelab servers, Proxmox, infrastructure |
|
||||
| New | Guest | TBD (e.g., `10.20.0.0/24`) | Guest WiFi — internet only, no local access |
|
||||
| New | IoT | TBD (e.g., `10.30.0.0/24`) | Smart devices — no internet by default |
|
||||
|
||||
**Discovery (parallel):**
|
||||
- Export current VLAN config (VLAN IDs, DHCP scopes, assignments)
|
||||
- Inventory all devices and current network placement
|
||||
- Document inter-VLAN routing rules
|
||||
- Check switch port VLAN assignments (tagged/untagged)
|
||||
|
||||
**Analysis (parallel):**
|
||||
- Determine which devices move to IoT VLAN (Roku, smart bulbs, switches, HA hub)
|
||||
- Design DHCP scopes for new VLANs
|
||||
- Plan inter-VLAN access: IoT reaches HA only, HA reaches into IoT, no IoT internet
|
||||
- WiFi SSIDs: one per VLAN or shared SSID with VLAN assignment?
|
||||
|
||||
**Remediation (sequential):**
|
||||
- Create Guest and IoT VLANs in UniFi
|
||||
- Configure DHCP for new VLANs
|
||||
- Create WiFi networks (Guest SSID, IoT SSID)
|
||||
- Migrate devices to appropriate VLANs
|
||||
- Validate connectivity per VLAN
|
||||
- Document new topology
|
||||
|
||||
---
|
||||
|
||||
### Layer 3: DNS
|
||||
|
||||
**Goal:** Validate Pi-hole HA, plan mDNS for smart home, ensure DNS works across all four VLANs.
|
||||
|
||||
**Discovery (parallel):**
|
||||
- Validate Orbital Sync (matching blocklists, custom entries on both Pi-holes)
|
||||
- Check NPM DNS sync cron — is `custom.list` consistent?
|
||||
- Document current DNS records in `homelab.local` zone
|
||||
- Check DHCP DNS server advertisements on both existing VLANs
|
||||
|
||||
**Analysis (parallel):**
|
||||
- Verify failover: what happens when primary (`10.10.0.16`) goes down?
|
||||
- DNS per VLAN: Guest gets Pi-hole (ad blocking) but NOT internal name resolution. IoT resolves HA only.
|
||||
- mDNS for smart home — Matter/HomeKit use mDNS for discovery, doesn't cross VLANs. Options:
|
||||
- UniFi mDNS reflector (built-in, simple, reflects everything)
|
||||
- Avahi reflector on a host (more granular)
|
||||
- Explicit HA configuration for IoT VLAN discovery
|
||||
- Check if iOS DNS bypass issue (from KB) is still relevant
|
||||
|
||||
**Remediation (sequential):**
|
||||
- Configure DNS for Guest and IoT VLANs
|
||||
- Set up mDNS reflection (method TBD)
|
||||
- Fix any Orbital Sync or failover gaps
|
||||
- Validate DNS resolution from each VLAN
|
||||
- Document DNS architecture
|
||||
|
||||
---
|
||||
|
||||
### Layer 4: Firewall & Security
|
||||
|
||||
**Goal:** Clean up rules, audit WAN exposure, validate internal domain, harden perimeter.
|
||||
|
||||
**Discovery (parallel):**
|
||||
- Export all UniFi firewall rules (WAN/LAN/Guest, in/out/local)
|
||||
- Inventory all NPM proxy hosts — which services exposed on `*.manticorum.com`
|
||||
- Test internal domain resolution: does `.homelab.local` work from each network?
|
||||
- Check NPM SSL cert status and auto-renewal
|
||||
- Document port forwards on UDM Pro
|
||||
- Check UDM Pro WAN-facing services (remote management, STUN, UPnP)
|
||||
|
||||
**Analysis (parallel):**
|
||||
- **Firewall rule audit:** Redundant, conflicting, or overly broad rules? Missing rules (e.g., IoT→Lab block)?
|
||||
- **NPM exposure review:** Per proxy host — does it need to be internet-facing? Auth configured? Security headers (HSTS, X-Frame-Options, CSP)?
|
||||
- **Internal domain strategy:** `.local` conflicts with mDNS. Options:
|
||||
- Keep `.homelab.local` with Pi-hole handling (risk of mDNS collision)
|
||||
- Switch to `lab.manticorum.com` with split DNS (recommended — you own the domain, no mDNS conflict, clean)
|
||||
- Use `.home.arpa` (RFC 8375, purpose-built for home networks)
|
||||
- **Inter-VLAN rules:** Guest = internet-only. IoT = no internet, HA access only. Lab = reachable from Home, not from Guest/IoT.
|
||||
- **WAN hardening:** UPnP status, unnecessary exposure
|
||||
|
||||
**Remediation (sequential):**
|
||||
- Remove/consolidate stale firewall rules
|
||||
- Harden NPM proxy hosts (auth, headers, prune unnecessary exposure)
|
||||
- Implement chosen internal domain strategy (recommendation: `lab.manticorum.com` split DNS)
|
||||
- Create inter-VLAN firewall rules for Guest and IoT
|
||||
- Disable UPnP if enabled, close unnecessary WAN exposure
|
||||
- External port scan validation
|
||||
- Document final ruleset and NPM inventory
|
||||
|
||||
---
|
||||
|
||||
### Layer 5: Overlay & Remote Access
|
||||
|
||||
**Goal:** Tailscale full mesh — universal reachability across home, cellular, and cloud.
|
||||
|
||||
**Discovery (parallel):**
|
||||
- Document current Tailscale setup (devices, exit nodes, ACL policy)
|
||||
- Check for subnet router usage vs exit-node-only
|
||||
- Identify all devices for the mesh (workstation, phones, laptops, servers, cloud VMs)
|
||||
- Check if OpenVPN is active or legacy
|
||||
|
||||
**Analysis (parallel):**
|
||||
- **Architecture options:**
|
||||
- Subnet routers: Tailscale on 1-2 hosts advertising home + lab subnets. Simpler, fewer installs.
|
||||
- Full mesh: Tailscale on every server. Direct reachability, no SPOF, more to manage.
|
||||
- Hybrid (recommended): Tailscale on key servers + subnet router for the rest.
|
||||
- **DNS integration:** Tailscale MagicDNS vs Pi-hole coexistence
|
||||
- **ACL policy:** Which devices reach which? Phones get everything? Cloud VMs lab-only?
|
||||
- **Exit node strategy:** Keep current phone exit nodes? Add workstation?
|
||||
- **OpenVPN decommission:** If Tailscale covers all use cases, remove it
|
||||
|
||||
**Remediation (sequential):**
|
||||
- Install/configure Tailscale on chosen devices
|
||||
- Set up subnet routes or direct mesh
|
||||
- Configure Tailscale ACLs
|
||||
- Integrate DNS (MagicDNS + Pi-hole)
|
||||
- Test: home→cloud, cellular→lab, cloud→home
|
||||
- Decommission OpenVPN if replaced
|
||||
- Document mesh topology and ACLs
|
||||
|
||||
---
|
||||
|
||||
### Layer 6: Smart Home Foundation
|
||||
|
||||
**Goal:** IoT VLAN ready (from Layer 2), Home Assistant deployed, Matter/Thread infrastructure in place.
|
||||
|
||||
**Discovery (parallel):**
|
||||
- Inventory smart devices — protocols (WiFi, Zigbee, Z-Wave, Matter, Thread)
|
||||
- Document HA hardware (antenna type — Zigbee coordinator? Thread border router? SkyConnect?)
|
||||
- Document previous HomeKit/Matter attempts — what failed and why
|
||||
- Identify devices for HA migration
|
||||
|
||||
**Analysis (parallel):**
|
||||
- **Protocol strategy:**
|
||||
- Which devices support Matter (firmware update path)?
|
||||
- WiFi-only devices → IoT VLAN, managed through HA
|
||||
- Zigbee/Thread devices → HA radio, no VLAN needed
|
||||
- **HA network placement:** Must reach IoT VLAN, be reachable from Home VLAN (UI), handle mDNS. Options: dedicated VM, container on manticore, dedicated hardware.
|
||||
- **Matter/Thread specifics:**
|
||||
- Thread border routers: same segment as HA coordinator
|
||||
- Matter commissioning uses BLE + WiFi — which VLAN?
|
||||
- Apple Home: HA HomeKit bridge vs replace HomeKit entirely
|
||||
- **Migration path:** Phased, validate each batch
|
||||
|
||||
**Remediation (sequential):**
|
||||
- Deploy Home Assistant (if not already running)
|
||||
- Configure HA network access (IoT VLAN reach, Home VLAN UI)
|
||||
- Set up Zigbee/Thread coordinator
|
||||
- Migrate devices in phases
|
||||
- Test Matter commissioning end-to-end
|
||||
- Document device inventory, protocols, HA architecture
|
||||
|
||||
---
|
||||
|
||||
### Final Pass: Cross-Cutting Security Audit
|
||||
|
||||
**Goal:** Holistic review after all layers complete — catch anything missed or introduced.
|
||||
|
||||
**Agent:** `security-auditor` lead, `pentester` assist.
|
||||
|
||||
**Tasks:**
|
||||
- Port scan from WAN — verify only intended services reachable
|
||||
- Inter-VLAN isolation verification — Guest can't reach Lab/Home/IoT, IoT can't reach internet or Lab
|
||||
- NPM proxy hosts: SSL + headers validated
|
||||
- No default credentials on network gear or exposed services
|
||||
- Tailscale ACLs match actual reachability
|
||||
- Produce final network topology document
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
Layer 1 (WiFi) ─────────────────────────────────────────────┐
|
||||
│ │
|
||||
Layer 2 (VLANs) ────────────────────────────────────────────┤
|
||||
│ │
|
||||
Layer 3 (DNS) ──────────────────────────────────────────────┤
|
||||
│ │
|
||||
Layer 4 (Firewall) ─────────────────────────────────────────┤
|
||||
│ │
|
||||
Layer 5 (Tailscale) ────────────────────────────────────────┤
|
||||
│ │
|
||||
Layer 6 (Smart Home) ───────────────────────────────────────┤
|
||||
│
|
||||
Final Pass
|
||||
```
|
||||
|
||||
Layers are sequential — each builds on the one below. Within each layer, discovery and analysis phases run parallel sub-agents. Remediation is sequential within a layer.
|
||||
|
||||
## Deliverables
|
||||
|
||||
Per layer:
|
||||
- Baseline snapshot (current state before changes)
|
||||
- Changes made (with rationale)
|
||||
- Validation results
|
||||
- Updated documentation
|
||||
|
||||
Final:
|
||||
- Complete network topology document
|
||||
- Firewall rule inventory
|
||||
- NPM proxy host inventory with security status
|
||||
- Tailscale mesh diagram and ACL policy
|
||||
- Smart home device inventory and protocol map
|
||||
- Security audit report
|
||||
Loading…
Reference in New Issue
Block a user