diff --git a/docs/superpowers/plans/2026-04-08-home-network-review.md b/docs/superpowers/plans/2026-04-08-home-network-review.md new file mode 100644 index 0000000..cff0ce6 --- /dev/null +++ b/docs/superpowers/plans/2026-04-08-home-network-review.md @@ -0,0 +1,1321 @@ +# Home Network Review — Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Comprehensive home network review — optimize WiFi performance, expand to 4 VLANs (Home, Lab, Guest, IoT), harden security, deploy Tailscale full mesh, and build smart home foundation. + +**Architecture:** Hybrid layer-by-layer approach — discover-then-fix per layer, bottom-up. Each layer builds on the previous. Sub-agents execute discovery/analysis in parallel within each layer; remediation is sequential. + +**Tech Stack:** UniFi (UDM Pro, US-24-PoE, 3x AC Lite), Pi-hole HA (Orbital Sync), Nginx Proxy Manager, Tailscale, Home Assistant, Proxmox + +**Spec:** `docs/superpowers/specs/2026-04-08-home-network-review-design.md` + +**Key references:** +- SSH aliases configured in `~/.ssh/config` — use `ssh `, never manual `ssh -i` +- UniFi controller is on the UDM Pro — access via web UI or API +- Pi-hole primary: `10.10.0.16` (npm-pihole), secondary: `10.10.0.226` (manticore) +- NPM handles reverse proxy for `*.manticorum.com` + +--- + +## Layer 1: WiFi & Physical + +### Task 1.1: Discovery — Export AP & Client Data + +**Agent:** `network-engineer` + +**Goal:** Capture current WiFi configuration and client statistics as baseline. + +- [ ] **Step 1: Export AP radio configuration from UniFi** + +SSH into the UDM Pro or use the UniFi API to pull AP radio settings. From Cal's workstation: + +```bash +# List all APs and their radio configs via UniFi API +# UDM Pro API base: https://10.0.0.1 +# Authenticate and pull device list +curl -k -X POST https://10.0.0.1/api/auth/login \ + -H 'Content-Type: application/json' \ + -d '{"username":"","password":""}' \ + -c /tmp/unifi-cookies.txt + +curl -k -b /tmp/unifi-cookies.txt \ + https://10.0.0.1/proxy/network/api/s/default/stat/device \ + | python3 -m json.tool > /tmp/unifi-ap-config.json +``` + +If API auth requires interactive credentials, ask Cal to provide them or export from the UniFi web UI: +- Settings > WiFi > each network's config +- Devices > each AP > Settings > Radios + +Save output to `.claude/tmp/network-review/layer1/ap-config.json` + +- [ ] **Step 2: Export client device list with wireless stats** + +```bash +curl -k -b /tmp/unifi-cookies.txt \ + https://10.0.0.1/proxy/network/api/s/default/stat/sta \ + | python3 -m json.tool > .claude/tmp/network-review/layer1/client-stats.json +``` + +Key fields per client: `mac`, `hostname`, `ip`, `essid`, `channel`, `radio`, `signal`, `tx_rate`, `rx_rate`, `tx_retries`, `satisfaction`. + +- [ ] **Step 3: Document AP placement** + +Ask Cal to confirm or correct: +- AP - Office: ground floor / office room +- AP - First Floor: main living area +- AP - Upper Floor: upstairs (bedroom with Roku) + +Record mounting position (ceiling, wall, shelf), approximate distance between APs, and building material between floors. + +- [ ] **Step 4: Save baseline document** + +Write findings to `.claude/tmp/network-review/layer1/baseline.md` with: +- Per-AP: model, MAC, channel (2.4 + 5GHz), channel width, TX power, band steering setting, minimum RSSI +- Per-client summary: count per AP, count per band, any clients with high retry rates or low satisfaction scores +- Known issue: Roku (`20:ef:bd:60:5e:ae`) on AP-Upper Floor, Ch60 5GHz 80MHz, -44 dBm signal, 1x1 MIMO, Rx 6 Mbps, TX 433 Mbps, AP/Client Signal Balance: Poor + +--- + +### Task 1.2: Analysis — WiFi Optimization Recommendations + +**Agent:** `network-engineer` + +**Depends on:** Task 1.1 + +**Goal:** Analyze AP configs and client data, produce optimization recommendations. + +- [ ] **Step 1: Analyze channel plan** + +Read `.claude/tmp/network-review/layer1/baseline.md` and check: +- Are all 3 APs on different 5GHz channels? With 80MHz width, only channels 36, 52 (DFS), 100 (DFS), 149 are non-overlapping. +- Are 2.4GHz channels on 1, 6, 11 (non-overlapping)? +- Are DFS channels available and enabled? DFS gives access to less congested spectrum. + +- [ ] **Step 2: Analyze TX power levels** + +AC Lite max TX power is 20 dBm (5GHz). Check if APs are set to Auto or a fixed value. +- High TX power + weak client radio (Roku 1x1) = asymmetric link = AP/Client Signal Balance: Poor +- Recommendation framework: for rooms where APs serve weak clients, lower TX power to Medium or Low + +- [ ] **Step 3: Analyze band steering and 2.4GHz availability** + +Check if band steering is forcing clients to 5GHz. The Roku's 1x1 5GHz radio is weak — 2.4GHz would give it better range and wall penetration at the cost of throughput (which it can't use anyway at 1x1). + +- [ ] **Step 4: Identify all problematic clients** + +Beyond the Roku, scan client stats for: +- Any client with satisfaction < 50% +- Any client with TX retries > 10% +- Any client with Rx rate < 24 Mbps +- Any client with signal weaker than -75 dBm + +- [ ] **Step 5: Write recommendations** + +Save to `.claude/tmp/network-review/layer1/recommendations.md`: +- Recommended channel plan (specific channel per AP per band) +- Recommended TX power per AP per band +- Band steering recommendation +- Roku-specific fix (likely: lower AP-Upper Floor 5GHz TX power, or create separate 2.4GHz-only SSID) +- Minimum RSSI threshold recommendation +- Any other client-specific issues found + +Present recommendations to Cal for approval before remediation. + +--- + +### Task 1.3: Remediation — Apply WiFi Optimizations + +**Agent:** `network-engineer` + +**Depends on:** Task 1.2 + Cal's approval of recommendations + +**Goal:** Apply approved WiFi changes and validate. + +- [ ] **Step 1: Apply channel plan changes** + +Via UniFi web UI (Settings > WiFi or Devices > AP > Radios): +- Set each AP to the recommended channel and width +- Document exact changes made + +- [ ] **Step 2: Adjust TX power levels** + +Set TX power per AP per band as recommended. Note: changes take effect immediately and may briefly disconnect clients. + +- [ ] **Step 3: Configure band steering / minimum RSSI** + +Apply band steering and minimum RSSI settings as recommended. + +- [ ] **Step 4: Validate Roku improvement** + +After changes settle (wait 5-10 minutes for clients to reassociate): +- Check Roku's connection in UniFi: Rx rate, signal, AP/Client Signal Balance +- Ask Cal to test streaming at normal bitrate +- If Roku moved to 2.4GHz, verify adequate throughput (should see 50-70 Mbps link rate on 2.4GHz, plenty for streaming) + +- [ ] **Step 5: Validate all clients** + +Re-export client stats (same as Task 1.1 Step 2) and compare: +- Any clients that lost connectivity or degraded? +- Overall satisfaction scores improved? + +- [ ] **Step 6: Document results** + +Update `.claude/tmp/network-review/layer1/baseline.md` with "after" state. Save diff summary to `.claude/tmp/network-review/layer1/changes.md`. + +--- + +## Layer 2: Network Architecture + +### Task 2.1: Discovery — Current VLAN & Network Inventory + +**Agent:** `network-engineer` + +**Goal:** Document current VLAN config, device inventory, and switch port assignments. + +- [ ] **Step 1: Export VLAN configuration from UniFi** + +Via UniFi API or web UI (Settings > Networks): +- Document each network: name, VLAN ID, subnet, DHCP range, gateway, purpose +- Current known: Home (`10.0.0.0/23`), Lab (`10.10.0.0/24`) + +```bash +curl -k -b /tmp/unifi-cookies.txt \ + https://10.0.0.1/proxy/network/api/s/default/rest/networkconf \ + | python3 -m json.tool > .claude/tmp/network-review/layer2/vlan-config.json +``` + +- [ ] **Step 2: Pull full device inventory** + +Export all known clients from UniFi with their network assignment: + +```bash +curl -k -b /tmp/unifi-cookies.txt \ + https://10.0.0.1/proxy/network/api/s/default/stat/alluser \ + | python3 -m json.tool > .claude/tmp/network-review/layer2/all-devices.json +``` + +Categorize devices: which are personal (Home), which are infrastructure (Lab), which are IoT candidates, which are unknown. + +- [ ] **Step 3: Document switch port assignments** + +Check US-24-PoE port profiles: +- Which ports are tagged/untagged for which VLANs? +- Which ports have APs connected? +- Any trunk ports? + +- [ ] **Step 4: Document inter-VLAN routing rules** + +Export current firewall rules that govern Home↔Lab traffic: + +```bash +curl -k -b /tmp/unifi-cookies.txt \ + https://10.0.0.1/proxy/network/api/s/default/rest/firewallrule \ + | python3 -m json.tool > .claude/tmp/network-review/layer2/firewall-rules.json +``` + +- [ ] **Step 5: Save baseline** + +Write to `.claude/tmp/network-review/layer2/baseline.md`: +- Current VLAN table (ID, name, subnet, DHCP range, gateway) +- Device count per network +- IoT device candidates list +- Switch port map +- Inter-VLAN rules summary + +--- + +### Task 2.2: Analysis — VLAN Expansion Design + +**Agent:** `network-engineer` + `it-ops-orchestrator` + +**Depends on:** Task 2.1 + +**Goal:** Design the 4-VLAN topology with specific VLAN IDs, subnets, DHCP ranges, and WiFi SSIDs. + +- [ ] **Step 1: Assign VLAN IDs and subnets** + +Propose specific values based on existing config (avoid conflicts): +- Home: keep existing VLAN ID and `10.0.0.0/23` +- Lab: keep existing VLAN ID and `10.10.0.0/24` +- Guest: new VLAN ID, `10.20.0.0/24`, DHCP `10.20.0.100-10.20.0.254` +- IoT: new VLAN ID, `10.30.0.0/24`, DHCP `10.30.0.100-10.30.0.254` + +- [ ] **Step 2: Plan WiFi SSID strategy** + +Options: +- Separate SSIDs per VLAN: `Corum` (Home), `Corum-Lab` (Lab), `Corum-Guest` (Guest), `Corum-IoT` (IoT) +- Shared SSID with RADIUS VLAN assignment (more complex, not recommended for homelab) + +Recommend separate SSIDs. Determine which SSIDs broadcast on which APs (Guest probably all APs, IoT probably all APs, Lab maybe not needed on WiFi). + +- [ ] **Step 3: Plan device migration** + +From Task 2.1 device inventory, list which devices move: +- To IoT: Roku, smart bulbs, smart switches, any sensors +- To Guest: none initially (new network for visitors) +- Stay on Home: PCs, phones, tablets, laptops +- Stay on Lab: servers, Proxmox, infrastructure + +- [ ] **Step 4: Plan inter-VLAN firewall rules** + +Design rules for all VLAN pairs: +- Guest → anywhere local: DENY (internet only) +- Guest → internet: ALLOW +- IoT → internet: DENY (default) +- IoT → Home Assistant IP: ALLOW (specific port) +- IoT → everything else: DENY +- Home → Lab: ALLOW (existing) +- Home → IoT: ALLOW (for HA web UI, device management) +- Lab → IoT: ALLOW (HA lives in Lab or Home) + +- [ ] **Step 5: Write migration plan** + +Save to `.claude/tmp/network-review/layer2/vlan-design.md`: +- Complete VLAN table with IDs, subnets, DHCP ranges, gateways +- SSID plan +- Device migration list +- Firewall rule matrix +- Migration sequence (create VLANs → create SSIDs → migrate devices → validate) + +Present to Cal for approval. + +--- + +### Task 2.3: Remediation — Create VLANs and Migrate Devices + +**Agent:** `network-engineer` + +**Depends on:** Task 2.2 + Cal's approval + +**Goal:** Implement the VLAN expansion. + +- [ ] **Step 1: Create Guest VLAN in UniFi** + +Settings > Networks > Create New Network: +- Name: Guest +- VLAN ID: (as designed) +- Gateway/Subnet: `10.20.0.1/24` +- DHCP range: `10.20.0.100 - 10.20.0.254` +- DNS: Pi-hole IPs +- Purpose: mark as Guest network in UniFi (enables guest isolation features) + +- [ ] **Step 2: Create IoT VLAN in UniFi** + +Settings > Networks > Create New Network: +- Name: IoT +- VLAN ID: (as designed) +- Gateway/Subnet: `10.30.0.1/24` +- DHCP range: `10.30.0.100 - 10.30.0.254` +- DNS: Pi-hole IPs (or restricted DNS — determined in Layer 3) + +- [ ] **Step 3: Create Guest WiFi SSID** + +Settings > WiFi > Create New WiFi Network: +- Name: `Corum-Guest` (or as designed) +- Network: Guest +- Security: WPA2/WPA3 +- Password: generate strong password +- Enable on all APs + +- [ ] **Step 4: Create IoT WiFi SSID** + +Settings > WiFi > Create New WiFi Network: +- Name: `Corum-IoT` (or as designed) +- Network: IoT +- Security: WPA2 (some IoT devices don't support WPA3) +- Password: generate strong password +- Enable on all APs + +- [ ] **Step 5: Migrate IoT devices** + +Move devices one at a time to the IoT SSID. Start with the Roku as a test: +- Connect Roku to `Corum-IoT` +- Verify it gets a `10.30.0.x` address +- Verify it can stream (internet access temporarily allowed until Layer 4 firewall rules) +- Proceed with other IoT devices + +- [ ] **Step 6: Validate all networks** + +From a device on each VLAN, verify: +- DHCP gives correct IP range +- Default gateway responds +- DNS resolves (at least public domains) +- Internet access works (except IoT — will be blocked in Layer 4) + +- [ ] **Step 7: Document results** + +Update `.claude/tmp/network-review/layer2/baseline.md` with new state. Save changes to `.claude/tmp/network-review/layer2/changes.md`. + +--- + +## Layer 3: DNS + +### Task 3.1: Discovery — DNS Infrastructure Validation + +**Agent:** `network-engineer` + +**Goal:** Validate Pi-hole HA setup, document DNS records, check resolution across all VLANs. + +- [ ] **Step 1: Validate Orbital Sync** + +```bash +# Check Orbital Sync logs on manticore +ssh manticore "docker logs orbital-sync --tail 50" + +# Compare blocklists between primary and secondary +ssh npm-pihole "docker exec pihole sqlite3 /etc/pihole/gravity.db 'SELECT count(*) FROM gravity;'" +ssh manticore "docker exec pihole sqlite3 /etc/pihole/gravity.db 'SELECT count(*) FROM gravity;'" +``` + +Counts should match (or be very close). + +- [ ] **Step 2: Validate NPM DNS sync** + +```bash +# Check custom.list on both Pi-holes +ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list" +ssh manticore "docker exec pihole cat /etc/pihole/custom.list" +``` + +Compare outputs — they should match. Check the cron job: + +```bash +ssh npm-pihole "crontab -l | grep -i npm" +``` + +- [ ] **Step 3: Test DNS resolution from each network** + +From Cal's workstation (Home network): +```bash +# Internal resolution +nslookup proxmox.homelab.local 10.10.0.16 +nslookup git.manticorum.com 10.10.0.16 + +# Same queries against secondary +nslookup proxmox.homelab.local 10.10.0.226 +nslookup git.manticorum.com 10.10.0.226 + +# Public resolution +nslookup google.com 10.10.0.16 +``` + +Repeat from a device on Lab network. Note: Guest and IoT VLANs are new — test DNS from those too once devices are connected. + +- [ ] **Step 4: Test failover** + +```bash +# Stop primary Pi-hole temporarily +ssh npm-pihole "docker stop pihole" + +# Test resolution from workstation (should fall back to secondary) +nslookup google.com +nslookup git.manticorum.com + +# Restart primary +ssh npm-pihole "docker start pihole" +``` + +- [ ] **Step 5: Test .homelab.local resolution** + +Cal flagged that internal `.homelab.local` domains may not work: + +```bash +# Test from workstation +nslookup homelab.local +ping proxmox.homelab.local +ping nas.homelab.local +ping tdarr.homelab.local + +# Check what's actually in Pi-hole custom.list for these +ssh npm-pihole "docker exec pihole cat /etc/pihole/custom.list | grep homelab" +``` + +- [ ] **Step 6: Save baseline** + +Write to `.claude/tmp/network-review/layer3/baseline.md`: +- Orbital Sync status (healthy/unhealthy, blocklist counts) +- NPM DNS sync status (matching/mismatched, cron schedule) +- DNS resolution results per network per Pi-hole +- Failover test results +- `.homelab.local` resolution status +- Full custom.list contents + +--- + +### Task 3.2: Analysis — DNS Architecture Recommendations + +**Agent:** `network-engineer` + +**Depends on:** Task 3.1 + +**Goal:** Recommend DNS changes for 4-VLAN setup, mDNS strategy, and internal domain fix. + +- [ ] **Step 1: Assess internal domain strategy** + +Based on Task 3.1 findings on `.homelab.local`: +- If it works: document how, note mDNS collision risk for IoT +- If it doesn't work: recommend migration to `lab.manticorum.com` with split DNS + +For `lab.manticorum.com` split DNS: +- Pi-hole resolves `*.lab.manticorum.com` → internal IPs (via custom.list or local DNS records) +- Public DNS for `manticorum.com` does NOT have `lab` subdomain records +- Internal services get names like `proxmox.lab.manticorum.com`, `nas.lab.manticorum.com` + +- [ ] **Step 2: Plan DNS per VLAN** + +- Home: full Pi-hole (ad blocking + internal resolution) +- Lab: full Pi-hole (ad blocking + internal resolution) +- Guest: Pi-hole for ad blocking, but NO internal name resolution (guests shouldn't discover `proxmox.lab.manticorum.com`) +- IoT: Pi-hole for ad blocking, resolve Home Assistant only + +Assess whether per-VLAN DNS filtering requires separate Pi-hole configs or can be handled with Pi-hole groups/clients. + +- [ ] **Step 3: Plan mDNS for smart home** + +Matter/HomeKit devices use mDNS for discovery. mDNS is link-local and doesn't cross VLANs. + +Options: +- UniFi mDNS reflector: Settings > Networks > enable mDNS. Simple but reflects ALL mDNS across ALL VLANs. +- Avahi reflector on a host: more granular, can reflect only between specific VLANs (IoT ↔ Home) +- Skip for now, configure in Layer 6 when HA is deployed + +Recommend: enable UniFi mDNS reflector between IoT and Home VLANs only (if UniFi supports per-VLAN mDNS config). Otherwise, enable globally and restrict via firewall. + +- [ ] **Step 4: Write recommendations** + +Save to `.claude/tmp/network-review/layer3/recommendations.md`: +- Internal domain decision (keep `.homelab.local` or migrate to `lab.manticorum.com`) +- DNS records to create/migrate +- Per-VLAN DNS configuration +- mDNS strategy +- Any Orbital Sync or failover fixes needed + +Present to Cal for approval. + +--- + +### Task 3.3: Remediation — DNS Changes + +**Agent:** `network-engineer` + +**Depends on:** Task 3.2 + Cal's approval + +**Goal:** Implement DNS changes. + +- [ ] **Step 1: Implement internal domain strategy** + +If migrating to `lab.manticorum.com`: +```bash +# Update custom.list on primary Pi-hole with new domain names +# Example entries: +# 10.10.0.10 proxmox.lab.manticorum.com +# 10.10.0.20 nas.lab.manticorum.com +# 10.10.0.226 manticore.lab.manticorum.com +# etc. + +ssh npm-pihole "docker exec pihole bash -c 'cat >> /etc/pihole/custom.list << EOF +10.10.0.10 proxmox.lab.manticorum.com +10.10.0.20 nas.lab.manticorum.com +EOF'" + +# Restart Pi-hole DNS +ssh npm-pihole "docker exec pihole pihole restartdns" +``` + +Orbital Sync will propagate to secondary. Verify after sync. + +- [ ] **Step 2: Configure DNS for Guest VLAN** + +In UniFi DHCP settings for Guest network, set DNS servers to Pi-hole IPs. + +If Pi-hole group-based filtering is used, create a "Guest" group that blocks internal domain resolution. + +- [ ] **Step 3: Configure DNS for IoT VLAN** + +In UniFi DHCP settings for IoT network, set DNS servers to Pi-hole IPs. + +If needed, create an "IoT" group in Pi-hole with restricted resolution. + +- [ ] **Step 4: Configure mDNS reflection** + +In UniFi: Settings > Networks > (each network) > check mDNS settings. +Or globally: Settings > Services > MDNS. + +Enable as recommended in Task 3.2. + +- [ ] **Step 5: Validate DNS from all VLANs** + +From a device on each VLAN: +```bash +# Home: should resolve everything +nslookup proxmox.lab.manticorum.com +nslookup google.com + +# Lab: should resolve everything +nslookup proxmox.lab.manticorum.com +nslookup google.com + +# Guest: should resolve public only +nslookup google.com # should work +nslookup proxmox.lab.manticorum.com # should FAIL or return NXDOMAIN + +# IoT: should resolve HA + public (for now, until internet blocked in Layer 4) +nslookup google.com +``` + +- [ ] **Step 6: Validate failover still works** + +Repeat failover test from Task 3.1 Step 4 with new config. + +- [ ] **Step 7: Document results** + +Update `.claude/tmp/network-review/layer3/baseline.md` with new state. Save changes to `.claude/tmp/network-review/layer3/changes.md`. + +--- + +## Layer 4: Firewall & Security + +### Task 4.1: Discovery — Firewall Rules & WAN Exposure + +**Agent:** `security-engineer` + +**Goal:** Export and document all firewall rules, NPM proxy hosts, and WAN exposure. + +- [ ] **Step 1: Export all UniFi firewall rules** + +```bash +curl -k -b /tmp/unifi-cookies.txt \ + https://10.0.0.1/proxy/network/api/s/default/rest/firewallrule \ + | python3 -m json.tool > .claude/tmp/network-review/layer4/firewall-rules.json +``` + +Document each rule: name, action (allow/deny), source network, destination network, protocol, port, enabled/disabled. + +- [ ] **Step 2: Inventory NPM proxy hosts** + +SSH to the NPM host and export proxy host configs: + +```bash +# NPM stores config in SQLite +ssh npm-pihole "docker exec nginx-proxy-manager sqlite3 /data/database.sqlite \ + 'SELECT id, domain_names, forward_host, forward_port, ssl_forced, access_list_id FROM proxy_host;'" +``` + +For each proxy host, document: +- Domain name +- Internal target (IP:port) +- SSL enabled/forced? +- Access list applied? +- Is this service intentionally internet-facing? + +- [ ] **Step 3: Check port forwards on UDM Pro** + +```bash +curl -k -b /tmp/unifi-cookies.txt \ + https://10.0.0.1/proxy/network/api/s/default/rest/portforward \ + | python3 -m json.tool > .claude/tmp/network-review/layer4/port-forwards.json +``` + +- [ ] **Step 4: Check UDM Pro WAN services** + +Document: +- Is remote management enabled? (Settings > System > Controller) +- Is UPnP enabled? (Settings > Internet > WAN) +- STUN/TURN settings +- Any other WAN-facing features + +- [ ] **Step 5: Check NPM SSL cert status** + +```bash +ssh npm-pihole "docker exec nginx-proxy-manager sqlite3 /data/database.sqlite \ + 'SELECT id, domain_names, expires_on FROM certificate;'" +``` + +Identify any expired or soon-to-expire certs. + +- [ ] **Step 6: External port scan** + +From Cal's workstation, check what's visible from WAN perspective. Use an external port scanner or: + +```bash +# Get public IP +curl -s ifconfig.me + +# Use nmap from an external vantage point if available, or use an online scanner +# e.g., https://www.shodan.io/host/ +``` + +Ask Cal if he has an external server (cloud VM) that can run nmap against his public IP. + +- [ ] **Step 7: Save baseline** + +Write to `.claude/tmp/network-review/layer4/baseline.md`: +- Firewall rules table (all rules, annotated) +- NPM proxy host inventory +- Port forwards list +- WAN services status +- SSL cert status +- External scan results (if available) + +--- + +### Task 4.2: Analysis — Security Audit & Rule Design + +**Agent:** `security-auditor` + `security-engineer` + +**Depends on:** Task 4.1 + +**Goal:** Audit existing rules, design inter-VLAN rules for new VLANs, recommend hardening. + +- [ ] **Step 1: Audit existing firewall rules** + +Review each rule from Task 4.1: +- Is this rule still needed? (Ask Cal about any unclear rules) +- Is it overly broad? (e.g., allow ALL from Home to Lab vs specific ports) +- Are there conflicting rules? +- Are there rules that should exist but don't? + +Categorize: KEEP, MODIFY, REMOVE, ADD. + +- [ ] **Step 2: Audit NPM proxy hosts** + +For each internet-facing proxy host: +- Does this need to be internet-facing? Could it be Tailscale-only (after Layer 5)? +- Does it have authentication? (Basic auth, OAuth, none) +- Are security headers configured? Check for: + - `Strict-Transport-Security` (HSTS) + - `X-Frame-Options` + - `X-Content-Type-Options` + - `Content-Security-Policy` + - `Referrer-Policy` +- Is SSL forced (HTTP → HTTPS redirect)? +- Is HTTP/2 enabled? + +Categorize: KEEP PUBLIC, MOVE BEHIND TAILSCALE, ADD AUTH, REMOVE. + +- [ ] **Step 3: Design inter-VLAN firewall rules** + +Create the complete rule matrix for all 4 VLANs: + +``` +Guest → Home: DENY ALL +Guest → Lab: DENY ALL +Guest → IoT: DENY ALL +Guest → Internet: ALLOW ALL + +IoT → Internet: DENY ALL (default) +IoT → Home: DENY ALL +IoT → Lab: ALLOW to HA IP on specific port (8123) +IoT → IoT: ALLOW (devices may need to discover each other) + +Home → Lab: ALLOW ALL (or specific ports — review) +Home → IoT: ALLOW ALL (device management, HA UI) +Home → Guest: DENY ALL +Home → Internet: ALLOW ALL + +Lab → IoT: ALLOW (HA reaching into IoT VLAN) +Lab → Home: ALLOW (or restrict — review) +Lab → Guest: DENY ALL +Lab → Internet: ALLOW ALL +``` + +Note: the HA IP and VLAN placement will be finalized in Layer 6. Use placeholder for now, update during Layer 6 remediation. + +- [ ] **Step 4: Check for UPnP and other WAN risks** + +- If UPnP is enabled: recommend disabling (allows devices to open ports without firewall rules) +- If remote management is enabled: recommend disabling or restricting to Tailscale +- Check for DNS rebinding protection +- Check for IGMP snooping settings + +- [ ] **Step 5: Write recommendations** + +Save to `.claude/tmp/network-review/layer4/recommendations.md`: +- Firewall rule changes (table: rule, action, reason) +- NPM proxy host changes (table: host, action, reason) +- Inter-VLAN firewall rules (complete rule set) +- WAN hardening changes +- Internal domain implementation details (if `lab.manticorum.com` approved in Layer 3) + +Present to Cal for approval. + +--- + +### Task 4.3: Remediation — Firewall & Security Hardening + +**Agent:** `security-engineer` + +**Depends on:** Task 4.2 + Cal's approval + +**Goal:** Implement approved firewall and security changes. + +- [ ] **Step 1: Clean up existing firewall rules** + +In UniFi, for each rule marked MODIFY or REMOVE in Task 4.2: +- Remove stale rules +- Tighten overly broad rules +- Document each change + +- [ ] **Step 2: Create inter-VLAN firewall rules** + +In UniFi (Settings > Firewall & Security > Firewall Rules), create rules in this order (order matters — UniFi processes top-down): + +1. Allow established/related (should exist by default) +2. Allow IoT → HA IP:8123 (TCP) +3. Deny IoT → all RFC1918 +4. Deny IoT → internet (WAN OUT rule) +5. Allow IoT → IoT (same VLAN, if needed for device discovery) +6. Deny Guest → all RFC1918 +7. Allow Guest → internet (implicit, but explicit rule for clarity) +8. Allow Home → Lab +9. Allow Home → IoT +10. Allow Lab → IoT + +- [ ] **Step 3: Harden NPM proxy hosts** + +For each proxy host needing changes: +- Add security headers via NPM Advanced tab (custom Nginx config) +- Enable SSL forcing +- Add access lists where recommended +- Remove proxy hosts that should no longer be public + +Example NPM Advanced config for security headers: +```nginx +add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always; +add_header X-Frame-Options "SAMEORIGIN" always; +add_header X-Content-Type-Options "nosniff" always; +add_header Referrer-Policy "strict-origin-when-cross-origin" always; +``` + +- [ ] **Step 4: Disable UPnP and harden WAN** + +In UniFi: +- Disable UPnP if enabled (Settings > Internet > WAN) +- Disable remote management if enabled, or restrict access +- Enable DNS rebinding protection if available + +- [ ] **Step 5: Validate inter-VLAN isolation** + +Test from each VLAN: +```bash +# From Guest device: should NOT reach any internal IP +ping 10.0.0.1 # Home gateway — should fail +ping 10.10.0.10 # Proxmox — should fail +ping 10.30.0.1 # IoT gateway — should fail +ping 8.8.8.8 # Internet — should work + +# From IoT device: should reach HA only +ping # Should work +ping 10.10.0.10 # Proxmox — should fail +ping 8.8.8.8 # Internet — should fail (after IoT internet block applied) +``` + +- [ ] **Step 6: Validate NPM changes** + +For each modified proxy host: +```bash +# Test security headers +curl -sI https://git.manticorum.com | grep -iE 'strict-transport|x-frame|x-content-type|referrer-policy' + +# Test SSL forced +curl -sI http://git.manticorum.com | head -5 # Should 301/302 to HTTPS +``` + +- [ ] **Step 7: Document results** + +Update `.claude/tmp/network-review/layer4/baseline.md` with new state. Save changes to `.claude/tmp/network-review/layer4/changes.md`. + +--- + +## Layer 5: Overlay & Remote Access + +### Task 5.1: Discovery — Current Tailscale & VPN State + +**Agent:** `network-engineer` + +**Goal:** Document current Tailscale setup and identify all devices for the mesh. + +- [ ] **Step 1: Check current Tailscale status** + +```bash +# On workstation +tailscale status +tailscale debug prefs +``` + +Document: which devices are in the tailnet, their roles (exit node, subnet router, regular node), Tailscale IP assignments. + +- [ ] **Step 2: Check Tailscale admin console** + +Ask Cal to check https://login.tailscale.com/admin: +- ACL policy (Access Controls tab) +- DNS settings (DNS tab — MagicDNS, nameservers) +- Current device list with last seen timestamps + +- [ ] **Step 3: Check if OpenVPN is active** + +```bash +# Check if OpenVPN server is running anywhere +ssh manticore "systemctl status openvpn 2>/dev/null || docker ps | grep -i vpn" +ssh npm-pihole "systemctl status openvpn 2>/dev/null || docker ps | grep -i vpn" +``` + +- [ ] **Step 4: Identify target devices for full mesh** + +List all devices that should be on Tailscale: +- Workstation (likely already on) +- Phones (already on) +- Laptops +- Key servers: Proxmox, manticore, docker-sba, npm-pihole +- Any cloud VMs +- Any work devices that need home access + +- [ ] **Step 5: Save baseline** + +Write to `.claude/tmp/network-review/layer5/baseline.md`: +- Current tailnet members and roles +- Current ACL policy +- Current DNS config +- OpenVPN status +- Target device list for expansion + +--- + +### Task 5.2: Analysis — Tailscale Mesh Design + +**Agent:** `network-engineer` + +**Depends on:** Task 5.1 + +**Goal:** Design the full mesh architecture. + +- [ ] **Step 1: Choose architecture pattern** + +Based on Cal's goal (all devices reach each other from anywhere): + +Recommend hybrid approach: +- **Tailscale on key servers** (Proxmox, manticore, workstation) — direct mesh nodes +- **Subnet router on one host** (e.g., manticore or UDM Pro) — advertises `10.0.0.0/23` (Home) and `10.10.0.0/24` (Lab) so remote devices can reach ALL local IPs without individual Tailscale installs +- **Phones/laptops** — direct mesh nodes, can reach everything via subnet routes +- **Exit nodes** — keep current setup (home + lab exit nodes for routing all traffic through home when on public WiFi) + +Note: UDM Pro may not support Tailscale natively. Manticore or a dedicated LXC/VM is likely the subnet router. + +- [ ] **Step 2: Design ACL policy** + +```json +{ + "acls": [ + // Cal's devices can access everything + {"action": "accept", "src": ["tag:personal"], "dst": ["*:*"]}, + // Servers can reach each other + {"action": "accept", "src": ["tag:server"], "dst": ["tag:server:*"]}, + // Cloud VMs can reach lab only + {"action": "accept", "src": ["tag:cloud"], "dst": ["10.10.0.0/24:*"]} + ], + "tagOwners": { + "tag:personal": ["cal@..."], + "tag:server": ["cal@..."], + "tag:cloud": ["cal@..."] + } +} +``` + +Adjust based on Cal's needs. + +- [ ] **Step 3: Plan DNS integration** + +Options: +- MagicDNS only: Tailscale assigns `.tailnet-name.ts.net` names +- MagicDNS + Pi-hole: set Pi-holes as custom nameservers in Tailscale admin, MagicDNS resolves Tailscale names, Pi-hole resolves internal names +- Split DNS in Tailscale: route `lab.manticorum.com` queries to Pi-hole, everything else to public DNS + +Recommend: Split DNS in Tailscale admin — `lab.manticorum.com` → Pi-hole IPs, everything else → default. MagicDNS enabled for Tailscale device names. + +- [ ] **Step 4: Assess services to move behind Tailscale** + +From Task 4.2 NPM audit, identify services that could move from public NPM to Tailscale-only: +- Services only Cal accesses remotely → Tailscale only (remove from NPM public) +- Services others need → keep on NPM public +- This reduces WAN attack surface + +- [ ] **Step 5: Write recommendations** + +Save to `.claude/tmp/network-review/layer5/recommendations.md`: +- Architecture diagram (text) +- Device list with roles (mesh node, subnet router, exit node) +- ACL policy +- DNS integration plan +- Services to move behind Tailscale +- OpenVPN decommission plan (if applicable) + +Present to Cal for approval. + +--- + +### Task 5.3: Remediation — Deploy Tailscale Mesh + +**Agent:** `network-engineer` + +**Depends on:** Task 5.2 + Cal's approval + +**Goal:** Install Tailscale on target devices, configure mesh. + +- [ ] **Step 1: Install Tailscale on subnet router host** + +```bash +# On manticore (or chosen host) +ssh manticore +curl -fsSL https://tailscale.com/install.sh | sh +sudo tailscale up --advertise-routes=10.0.0.0/23,10.10.0.0/24 --accept-dns=false +``` + +Approve routes in Tailscale admin console. +Enable IP forwarding: +```bash +echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.d/99-tailscale.conf +sudo sysctl -p /etc/sysctl.d/99-tailscale.conf +``` + +- [ ] **Step 2: Install Tailscale on additional servers** + +For each server that should be a direct mesh node: +```bash +ssh +curl -fsSL https://tailscale.com/install.sh | sh +sudo tailscale up --accept-routes +``` + +Tag each device in Tailscale admin. + +- [ ] **Step 3: Configure ACL policy** + +In Tailscale admin (Access Controls tab), apply the designed ACL policy from Task 5.2. + +- [ ] **Step 4: Configure DNS** + +In Tailscale admin (DNS tab): +- Enable MagicDNS +- Add split DNS: `lab.manticorum.com` → Pi-hole IPs (`10.10.0.16`, `10.10.0.226`) +- Set global nameservers if desired + +- [ ] **Step 5: Move services behind Tailscale** + +For each service identified in Task 5.2: +- Remove or disable the NPM proxy host +- Verify access works via Tailscale IP or MagicDNS name +- Update any bookmarks or saved URLs + +- [ ] **Step 6: Test reachability matrix** + +Test from each location type: + +```bash +# From phone on cellular (Tailscale active): +# Can reach workstation? +ping +# Can reach manticore? +ping +# Can reach internal services via subnet route? +curl http://10.10.0.10:8006 # Proxmox + +# From workstation at home: +# Can reach phone's Tailscale IP? +ping + +# From cloud VM (if applicable): +# Can reach lab subnet? +ping 10.10.0.226 +``` + +- [ ] **Step 7: Decommission OpenVPN (if applicable)** + +If OpenVPN is active and Tailscale covers all use cases: +```bash +ssh "sudo systemctl stop openvpn && sudo systemctl disable openvpn" +``` + +Remove any related port forwards from UDM Pro. + +- [ ] **Step 8: Document results** + +Save to `.claude/tmp/network-review/layer5/`: +- `baseline.md` updated with new state +- `changes.md` with all changes made +- `mesh-topology.md` with device list, IPs, roles, ACL summary + +--- + +## Layer 6: Smart Home Foundation + +### Task 6.1: Discovery — Smart Home Inventory & HA Status + +**Agent:** `iot-engineer` + +**Goal:** Document current smart home devices, HA hardware, and previous HomeKit/Matter attempts. + +- [ ] **Step 1: Inventory smart devices** + +Ask Cal to list all smart home devices: +- Smart bulbs (brand, model, protocol — WiFi? Zigbee? Thread?) +- Smart switches (brand, model, protocol) +- Sensors (motion, temp, door/window) +- Smart plugs +- Cameras +- Thermostats +- Any other connected devices + +For each: current control method (app, HomeKit, Alexa, Google Home, manual). + +- [ ] **Step 2: Document HA hardware** + +Ask Cal about the Home Assistant antenna: +- What hardware? (SkyConnect? Sonoff Zigbee dongle? ConBee? Other?) +- Is it USB-connected to a specific host? +- Is Home Assistant OS/Container/Supervised installed anywhere, or just the antenna hardware? + +- [ ] **Step 3: Document previous HomeKit/Matter failures** + +Ask Cal: +- What devices were you trying to add via Matter? +- What was the failure mode? (Commissioning failed? Paired but unreliable? Couldn't discover?) +- Was this through Apple Home directly, or through HA's Matter integration? + +- [ ] **Step 4: Save baseline** + +Write to `.claude/tmp/network-review/layer6/baseline.md`: +- Device inventory table (device, brand/model, protocol, current controller, IoT VLAN candidate?) +- HA hardware details +- HomeKit/Matter failure notes +- Current HA installation status (not installed / installed but not configured / partially configured) + +--- + +### Task 6.2: Analysis — HA Architecture & Migration Plan + +**Agent:** `iot-engineer` + `network-engineer` + +**Depends on:** Task 6.1 + +**Goal:** Design HA deployment, network integration, and phased migration plan. + +- [ ] **Step 1: Determine HA deployment location** + +Options: +- **Dedicated VM on Proxmox** (recommended): isolated, snapshotable, dedicated resources. HAOS (Home Assistant OS) runs as a VM image. +- **Container on manticore**: lighter weight, but manticore already runs many services. Docker-based HA loses some features (add-ons). +- **Dedicated hardware** (Pi, NUC): only if USB radio has latency issues in a VM. + +Recommend: Proxmox VM running HAOS. The USB radio (Zigbee/Thread) can be passed through to the VM. + +- [ ] **Step 2: Plan HA network placement** + +HA VM needs network access to: +- IoT VLAN (`10.30.0.0/24`) — to manage IoT devices +- Home VLAN (`10.0.0.0/23`) — so Cal can access HA web UI +- Lab VLAN (`10.10.0.0/24`) — if HA lives here, it's already on Lab + +Options: +- HA on Lab VLAN with firewall rules allowing IoT↔HA and Home→HA:8123 +- HA with dual NICs (Lab + IoT) — more complex but direct IoT access +- HA on IoT VLAN with firewall rules allowing Home→HA:8123 and Lab→HA + +Recommend: HA on Lab VLAN (consistent with other infrastructure). Firewall rules from Layer 4 already allow Lab→IoT and IoT→HA. + +- [ ] **Step 3: Plan protocol strategy** + +Based on device inventory: +- **WiFi devices** → IoT VLAN, controlled by HA via WiFi (API/local polling) +- **Zigbee devices** → HA's Zigbee coordinator (USB radio), no VLAN needed (Zigbee is separate RF) +- **Thread/Matter devices** → HA's Thread border router (if SkyConnect or similar), commissioned via HA +- **Devices with no HA integration** → stay on current controller, evaluate over time + +- [ ] **Step 4: Plan mDNS configuration** + +From Layer 3, mDNS reflection should be configured. Verify it covers: +- HA discovering IoT VLAN devices via mDNS +- Matter commissioning (uses mDNS for device discovery after BLE pairing) +- HomeKit bridge (if HA exposes HomeKit bridge, Apple devices on Home VLAN need to discover it via mDNS) + +- [ ] **Step 5: Design phased migration** + +Phase 1: Deploy HA, configure Zigbee coordinator, add 2-3 Zigbee devices +Phase 2: Add WiFi devices from IoT VLAN +Phase 3: Attempt Matter commissioning with one device +Phase 4: HomeKit bridge (if desired — HA can expose devices to Apple Home) +Phase 5: Remaining devices, automations + +- [ ] **Step 6: Write recommendations** + +Save to `.claude/tmp/network-review/layer6/recommendations.md`: +- HA deployment plan (VM specs, network config) +- Protocol strategy per device +- mDNS requirements +- Migration phases with specific devices per phase +- Matter commissioning checklist + +Present to Cal for approval. + +--- + +### Task 6.3: Remediation — HA Deployment & Initial Setup + +**Agent:** `iot-engineer` + +**Depends on:** Task 6.2 + Cal's approval + +**Goal:** Deploy HA and complete Phase 1 of device migration. + +- [ ] **Step 1: Create HA VM on Proxmox** + +```bash +# Download HAOS image for Proxmox +# Create VM with recommended specs (2 vCPU, 4GB RAM, 32GB disk) +# Pass through USB radio device +``` + +Specific Proxmox commands will depend on the USB device and current Proxmox config. Use the `proxmox` skill/agent for VM creation. + +- [ ] **Step 2: Configure HA network** + +- Assign Lab VLAN IP (static, within `10.10.0.0/24`) +- Add DNS record: `ha.lab.manticorum.com` → HA IP (in Pi-hole custom.list) +- Verify HA web UI accessible from Home VLAN at `http://ha.lab.manticorum.com:8123` + +- [ ] **Step 3: Configure Zigbee coordinator** + +In HA: +- Install ZHA (Zigbee Home Automation) or Zigbee2MQTT integration +- Configure USB radio as coordinator +- Verify radio is detected and functional + +- [ ] **Step 4: Add test devices (Phase 1)** + +- Pair 2-3 Zigbee devices as proof of concept +- Verify control through HA dashboard +- Verify automations can be created + +- [ ] **Step 5: Validate network integration** + +- Can HA reach IoT VLAN devices? (ping `10.30.0.x` from HA) +- Can Home VLAN reach HA web UI? (browse to `ha.lab.manticorum.com:8123`) +- Does mDNS work between IoT and Home VLANs? +- Can HA discover IoT VLAN WiFi devices? + +- [ ] **Step 6: Document results** + +Save to `.claude/tmp/network-review/layer6/`: +- `baseline.md` updated with HA deployment details +- `changes.md` with all changes +- `device-inventory.md` with migrated devices and their status +- `migration-remaining.md` with Phase 2-5 steps + +--- + +## Final Pass: Cross-Cutting Security Audit + +### Task 7.1: External Security Validation + +**Agent:** `security-auditor` + `pentester` + +**Depends on:** All previous layers complete + +**Goal:** Verify the complete network from an adversarial perspective. + +- [ ] **Step 1: External port scan** + +From an external vantage point (cloud VM, or online scanner): +```bash +nmap -sS -sV -p- -oN .claude/tmp/network-review/final/external-scan.txt +``` + +Only expected open ports should appear (80, 443 for NPM, plus any intentionally forwarded ports). + +- [ ] **Step 2: Verify inter-VLAN isolation** + +From each VLAN, attempt to reach resources that should be blocked: + +Guest → Home, Lab, IoT: all blocked +IoT → Internet: blocked +IoT → Lab (non-HA): blocked +IoT → Home: blocked + +Document results in a matrix. + +- [ ] **Step 3: Validate NPM security** + +For each remaining public proxy host: +```bash +# Test security headers +curl -sI https:// | grep -iE 'strict-transport|x-frame|x-content|referrer|csp' + +# Test SSL grade (use external tool) +# e.g., https://www.ssllabs.com/ssltest/ +``` + +- [ ] **Step 4: Check for default credentials** + +Verify no network gear or exposed services use default passwords: +- UDM Pro admin +- Pi-hole admin +- NPM admin +- Proxmox root +- Any other web UIs + +- [ ] **Step 5: Validate Tailscale ACLs** + +From each Tailscale device, verify ACLs match design: +```bash +tailscale status +tailscale ping +``` + +Attempt connections that should be denied by ACL policy. + +- [ ] **Step 6: Produce final documentation** + +Create `.claude/tmp/network-review/final/network-review-report.md`: +- Executive summary (what was done, key findings, key improvements) +- Network topology diagram (text-based) +- VLAN table (final state) +- Firewall rule inventory (final state) +- NPM proxy host inventory with security status +- Tailscale mesh diagram and ACL policy +- Smart home device inventory and protocol map +- DNS architecture +- Remaining recommendations / future work +- Security audit findings and resolution status + +Save this report to the knowledge base via `/save-doc`. + +--- + +## Task Dependencies + +``` +Task 1.1 (WiFi Discovery) + → Task 1.2 (WiFi Analysis) + → Task 1.3 (WiFi Remediation) + → Task 2.1 (VLAN Discovery) + → Task 2.2 (VLAN Analysis) + → Task 2.3 (VLAN Remediation) + → Task 3.1 (DNS Discovery) + → Task 3.2 (DNS Analysis) + → Task 3.3 (DNS Remediation) + → Task 4.1 (Firewall Discovery) + → Task 4.2 (Firewall Analysis) + → Task 4.3 (Firewall Remediation) + → Task 5.1 (Tailscale Discovery) + → Task 5.2 (Tailscale Analysis) + → Task 5.3 (Tailscale Remediation) + → Task 6.1 (Smart Home Discovery) + → Task 6.2 (Smart Home Analysis) + → Task 6.3 (Smart Home Remediation) + → Task 7.1 (Final Security Audit) +``` + +Within each layer, discovery tasks can run parallel sub-agents for independent data collection. Analysis can run parallel sub-agents for independent review areas. Remediation is sequential within each layer. + +**Human approval gates:** Tasks 1.3, 2.3, 3.3, 4.3, 5.3, and 6.3 (all remediation tasks) require Cal's approval of the preceding analysis/recommendations before proceeding. diff --git a/docs/superpowers/specs/2026-04-08-home-network-review-design.md b/docs/superpowers/specs/2026-04-08-home-network-review-design.md new file mode 100644 index 0000000..1239cd4 --- /dev/null +++ b/docs/superpowers/specs/2026-04-08-home-network-review-design.md @@ -0,0 +1,297 @@ +# Home Network Review — Design Spec + +**Date:** 2026-04-08 +**Approach:** Hybrid Layer-by-Layer (discover-then-fix per layer, bottom-up) +**Execution model:** Sub-agent driven — parallel agents within each layer's discovery/analysis phases, sequential remediation + +## Context + +### Current Infrastructure +- **Router/Gateway:** UniFi UDM Pro +- **Switch:** US-24-PoE (250W) +- **Access Points:** 3x UAP-AC-Lite (Office, First Floor, Upper Floor) +- **Hypervisor:** Proxmox at `10.10.0.10` +- **Physical server:** ubuntu-manticore (`10.10.0.226`) — Pi-hole, Jellyfin, Tdarr, KB RAG stack +- **VM 115:** docker-sba (`10.10.0.88`) — Paper Dynasty, SBA services +- **NAS:** TrueNAS at `10.10.0.35` +- **Reverse proxy:** Nginx Proxy Manager — external access via `*.manticorum.com` +- **DNS:** Dual Pi-hole HA — primary `10.10.0.16` (npm-pihole LXC), secondary `10.10.0.226` (manticore), synced via Orbital Sync + NPM DNS sync cron + +### Current Network Topology +| Network | Subnet | Purpose | +|---------|--------|---------| +| Home | `10.0.0.0/23` | Personal devices | +| Lab | `10.10.0.0/24` | Homelab infrastructure | + +### Known Issues & Goals (Priority Order) +1. **Performance (C):** Roku on Upper Floor AP has 6 Mbps Rx rate despite -44 dBm signal. 1x1 MIMO, AP/Client Signal Balance: Poor. Likely AP TX power asymmetry with weak client radio. +2. **Cleanup (D):** Handful of custom firewall rules, need sanity check. Internal `.homelab.local` domain may not be functional — `.local` conflicts with mDNS (RFC 6762). +3. **Security (A):** Many services exposed via `*.manticorum.com` through NPM. Need WAN exposure audit. +4. **Reliability (B):** Validate Pi-hole HA failover, identify single points of failure. +5. **Expansion (E):** Add guest WiFi, expand Tailscale to full mesh, build smart home foundation. + +### Additional Requirements +- **Guest WiFi:** New VLAN, isolated, internet-only +- **Tailscale:** Currently on phones with exit nodes on both networks. Goal: universal reachability — all devices can reach each other whether on home/lab network, cellular, or cloud +- **Smart Home:** Home Assistant antenna installed, not migrated. Previous Matter/HomeKit attempts failed. Want solid network foundation (IoT VLAN, mDNS) before going deeper +- **IoT VLAN:** Default-deny internet access. Per-device exceptions if needed. + +## Design + +### Agent Assignments + +| Layer | Lead Agent(s) | Support | +|-------|---------------|---------| +| 1. WiFi & Physical | `network-engineer` | | +| 2. Network Architecture | `network-engineer` | `it-ops-orchestrator` | +| 3. DNS | `network-engineer` | | +| 4. Firewall & Security | `security-engineer`, `security-auditor` | | +| 5. Overlay & Remote Access | `network-engineer` | | +| 6. Smart Home Foundation | `iot-engineer` | `network-engineer` | +| Final Pass | `security-auditor` | `pentester` | + +### Per-Layer Workflow +Each layer follows the same three-phase cycle: +1. **Discover** — export configs, scan current state, document baseline (parallel sub-agents) +2. **Analyze** — review findings, identify issues, produce recommendations (parallel sub-agents) +3. **Remediate** — implement changes, validate, document new state (sequential) + +--- + +### Layer 1: WiFi & Physical + +**Goal:** Optimize wireless performance, diagnose Roku issue, establish baseline RF environment. + +**Discovery (parallel):** +- Export AP configs from UniFi (channels, power levels, band steering, DTIM, minimum RSSI) +- Pull client device list with signal/rate/retry stats +- Document AP placement (floor, room, mounting) +- Check for channel conflicts — 3 APs on 5GHz 80MHz channels could overlap + +**Analysis (parallel):** +- Evaluate channel plan — non-overlapping channels? DFS channels available? +- Review AP power levels — high TX power on AC Lites causes asymmetry with weak client radios +- Assess band steering config — is 2.4GHz available as fallback? +- Roku-specific: determine if lowering AP-Upper Floor TX power or moving Roku to 2.4GHz improves Rx rate + +**Remediation (sequential):** +- Apply optimized channel plan +- Adjust TX power levels per AP +- Configure minimum RSSI thresholds if not set +- Validate Roku improvement +- Document new baseline + +**Key insight:** The Roku's 1x1 radio with 6 Mbps Rx rate at -44 dBm signal strongly suggests AP TX power is too high relative to what the Roku can transmit back. Lowering AP power or moving to 2.4GHz are the likely fixes. + +--- + +### Layer 2: Network Architecture + +**Goal:** Expand from 2 VLANs to 4, supporting guest WiFi and IoT isolation. + +**Target VLAN layout:** + +| VLAN | Name | Subnet | Purpose | +|------|------|--------|---------| +| Existing | Home | `10.0.0.0/23` | Trusted personal devices | +| Existing | Lab | `10.10.0.0/24` | Homelab servers, Proxmox, infrastructure | +| New | Guest | TBD (e.g., `10.20.0.0/24`) | Guest WiFi — internet only, no local access | +| New | IoT | TBD (e.g., `10.30.0.0/24`) | Smart devices — no internet by default | + +**Discovery (parallel):** +- Export current VLAN config (VLAN IDs, DHCP scopes, assignments) +- Inventory all devices and current network placement +- Document inter-VLAN routing rules +- Check switch port VLAN assignments (tagged/untagged) + +**Analysis (parallel):** +- Determine which devices move to IoT VLAN (Roku, smart bulbs, switches, HA hub) +- Design DHCP scopes for new VLANs +- Plan inter-VLAN access: IoT reaches HA only, HA reaches into IoT, no IoT internet +- WiFi SSIDs: one per VLAN or shared SSID with VLAN assignment? + +**Remediation (sequential):** +- Create Guest and IoT VLANs in UniFi +- Configure DHCP for new VLANs +- Create WiFi networks (Guest SSID, IoT SSID) +- Migrate devices to appropriate VLANs +- Validate connectivity per VLAN +- Document new topology + +--- + +### Layer 3: DNS + +**Goal:** Validate Pi-hole HA, plan mDNS for smart home, ensure DNS works across all four VLANs. + +**Discovery (parallel):** +- Validate Orbital Sync (matching blocklists, custom entries on both Pi-holes) +- Check NPM DNS sync cron — is `custom.list` consistent? +- Document current DNS records in `homelab.local` zone +- Check DHCP DNS server advertisements on both existing VLANs + +**Analysis (parallel):** +- Verify failover: what happens when primary (`10.10.0.16`) goes down? +- DNS per VLAN: Guest gets Pi-hole (ad blocking) but NOT internal name resolution. IoT resolves HA only. +- mDNS for smart home — Matter/HomeKit use mDNS for discovery, doesn't cross VLANs. Options: + - UniFi mDNS reflector (built-in, simple, reflects everything) + - Avahi reflector on a host (more granular) + - Explicit HA configuration for IoT VLAN discovery +- Check if iOS DNS bypass issue (from KB) is still relevant + +**Remediation (sequential):** +- Configure DNS for Guest and IoT VLANs +- Set up mDNS reflection (method TBD) +- Fix any Orbital Sync or failover gaps +- Validate DNS resolution from each VLAN +- Document DNS architecture + +--- + +### Layer 4: Firewall & Security + +**Goal:** Clean up rules, audit WAN exposure, validate internal domain, harden perimeter. + +**Discovery (parallel):** +- Export all UniFi firewall rules (WAN/LAN/Guest, in/out/local) +- Inventory all NPM proxy hosts — which services exposed on `*.manticorum.com` +- Test internal domain resolution: does `.homelab.local` work from each network? +- Check NPM SSL cert status and auto-renewal +- Document port forwards on UDM Pro +- Check UDM Pro WAN-facing services (remote management, STUN, UPnP) + +**Analysis (parallel):** +- **Firewall rule audit:** Redundant, conflicting, or overly broad rules? Missing rules (e.g., IoT→Lab block)? +- **NPM exposure review:** Per proxy host — does it need to be internet-facing? Auth configured? Security headers (HSTS, X-Frame-Options, CSP)? +- **Internal domain strategy:** `.local` conflicts with mDNS. Options: + - Keep `.homelab.local` with Pi-hole handling (risk of mDNS collision) + - Switch to `lab.manticorum.com` with split DNS (recommended — you own the domain, no mDNS conflict, clean) + - Use `.home.arpa` (RFC 8375, purpose-built for home networks) +- **Inter-VLAN rules:** Guest = internet-only. IoT = no internet, HA access only. Lab = reachable from Home, not from Guest/IoT. +- **WAN hardening:** UPnP status, unnecessary exposure + +**Remediation (sequential):** +- Remove/consolidate stale firewall rules +- Harden NPM proxy hosts (auth, headers, prune unnecessary exposure) +- Implement chosen internal domain strategy (recommendation: `lab.manticorum.com` split DNS) +- Create inter-VLAN firewall rules for Guest and IoT +- Disable UPnP if enabled, close unnecessary WAN exposure +- External port scan validation +- Document final ruleset and NPM inventory + +--- + +### Layer 5: Overlay & Remote Access + +**Goal:** Tailscale full mesh — universal reachability across home, cellular, and cloud. + +**Discovery (parallel):** +- Document current Tailscale setup (devices, exit nodes, ACL policy) +- Check for subnet router usage vs exit-node-only +- Identify all devices for the mesh (workstation, phones, laptops, servers, cloud VMs) +- Check if OpenVPN is active or legacy + +**Analysis (parallel):** +- **Architecture options:** + - Subnet routers: Tailscale on 1-2 hosts advertising home + lab subnets. Simpler, fewer installs. + - Full mesh: Tailscale on every server. Direct reachability, no SPOF, more to manage. + - Hybrid (recommended): Tailscale on key servers + subnet router for the rest. +- **DNS integration:** Tailscale MagicDNS vs Pi-hole coexistence +- **ACL policy:** Which devices reach which? Phones get everything? Cloud VMs lab-only? +- **Exit node strategy:** Keep current phone exit nodes? Add workstation? +- **OpenVPN decommission:** If Tailscale covers all use cases, remove it + +**Remediation (sequential):** +- Install/configure Tailscale on chosen devices +- Set up subnet routes or direct mesh +- Configure Tailscale ACLs +- Integrate DNS (MagicDNS + Pi-hole) +- Test: home→cloud, cellular→lab, cloud→home +- Decommission OpenVPN if replaced +- Document mesh topology and ACLs + +--- + +### Layer 6: Smart Home Foundation + +**Goal:** IoT VLAN ready (from Layer 2), Home Assistant deployed, Matter/Thread infrastructure in place. + +**Discovery (parallel):** +- Inventory smart devices — protocols (WiFi, Zigbee, Z-Wave, Matter, Thread) +- Document HA hardware (antenna type — Zigbee coordinator? Thread border router? SkyConnect?) +- Document previous HomeKit/Matter attempts — what failed and why +- Identify devices for HA migration + +**Analysis (parallel):** +- **Protocol strategy:** + - Which devices support Matter (firmware update path)? + - WiFi-only devices → IoT VLAN, managed through HA + - Zigbee/Thread devices → HA radio, no VLAN needed +- **HA network placement:** Must reach IoT VLAN, be reachable from Home VLAN (UI), handle mDNS. Options: dedicated VM, container on manticore, dedicated hardware. +- **Matter/Thread specifics:** + - Thread border routers: same segment as HA coordinator + - Matter commissioning uses BLE + WiFi — which VLAN? + - Apple Home: HA HomeKit bridge vs replace HomeKit entirely +- **Migration path:** Phased, validate each batch + +**Remediation (sequential):** +- Deploy Home Assistant (if not already running) +- Configure HA network access (IoT VLAN reach, Home VLAN UI) +- Set up Zigbee/Thread coordinator +- Migrate devices in phases +- Test Matter commissioning end-to-end +- Document device inventory, protocols, HA architecture + +--- + +### Final Pass: Cross-Cutting Security Audit + +**Goal:** Holistic review after all layers complete — catch anything missed or introduced. + +**Agent:** `security-auditor` lead, `pentester` assist. + +**Tasks:** +- Port scan from WAN — verify only intended services reachable +- Inter-VLAN isolation verification — Guest can't reach Lab/Home/IoT, IoT can't reach internet or Lab +- NPM proxy hosts: SSL + headers validated +- No default credentials on network gear or exposed services +- Tailscale ACLs match actual reachability +- Produce final network topology document + +--- + +## Dependencies + +``` +Layer 1 (WiFi) ─────────────────────────────────────────────┐ + │ │ +Layer 2 (VLANs) ────────────────────────────────────────────┤ + │ │ +Layer 3 (DNS) ──────────────────────────────────────────────┤ + │ │ +Layer 4 (Firewall) ─────────────────────────────────────────┤ + │ │ +Layer 5 (Tailscale) ────────────────────────────────────────┤ + │ │ +Layer 6 (Smart Home) ───────────────────────────────────────┤ + │ + Final Pass +``` + +Layers are sequential — each builds on the one below. Within each layer, discovery and analysis phases run parallel sub-agents. Remediation is sequential within a layer. + +## Deliverables + +Per layer: +- Baseline snapshot (current state before changes) +- Changes made (with rationale) +- Validation results +- Updated documentation + +Final: +- Complete network topology document +- Firewall rule inventory +- NPM proxy host inventory with security status +- Tailscale mesh diagram and ACL policy +- Smart home device inventory and protocol map +- Security audit report