From bd49e9d61d339253d3934f1ee75f9a86ad5f65b0 Mon Sep 17 00:00:00 2001 From: Cal Corum Date: Sun, 10 Aug 2025 16:21:28 -0500 Subject: [PATCH] CLAUDE: Add comprehensive home automation planning documents MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add Home Assistant deployment guide with container architecture - Document platform analysis comparing Home Assistant, OpenHAB, and Node-RED - Add voice automation architecture with local/cloud hybrid approach - Include implementation details for Rhasspy + Home Assistant integration - Provide step-by-step deployment guides and configuration templates - Document privacy-focused voice processing with local wake word detection 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude --- .../plans/home-assistant-deployment-guide.md | 185 +++ .../home-automation-platform-analysis.md | 94 ++ .../plans/voice-automation-architecture.md | 200 +++ ...voice-automation-implementation-details.md | 1182 +++++++++++++++++ 4 files changed, 1661 insertions(+) create mode 100644 .claude/plans/home-assistant-deployment-guide.md create mode 100644 .claude/plans/home-automation-platform-analysis.md create mode 100644 .claude/plans/voice-automation-architecture.md create mode 100644 .claude/plans/voice-automation-implementation-details.md diff --git a/.claude/plans/home-assistant-deployment-guide.md b/.claude/plans/home-assistant-deployment-guide.md new file mode 100644 index 0000000..9e710cb --- /dev/null +++ b/.claude/plans/home-assistant-deployment-guide.md @@ -0,0 +1,185 @@ +# Home Assistant Deployment Architecture + +## Recommended Deployment: Podman Container on Proxmox + +### Why Podman Container vs Home Assistant OS? +- **Flexibility:** Full control over host system, easier customization +- **Integration:** Better integration with existing Proxmox infrastructure +- **Backup:** Standard container backup/restore workflows +- **Resources:** More efficient resource usage +- **Updates:** Granular update control +- **GPU Support:** Already proven working with your Tdarr setup + +### Container Architecture + +```bash +# Core Home Assistant container +podman run -d --name homeassistant \ + --restart=unless-stopped \ + -p 8123:8123 \ + -v /path/to/config:/config \ + -v /etc/localtime:/etc/localtime:ro \ + --device /dev/ttyUSB0:/dev/ttyUSB0 \ # If you add Zigbee/Z-Wave + --network=host \ + ghcr.io/home-assistant/home-assistant:stable +``` + +### Supporting Services Architecture + +```yaml +# docker-compose.yml for full stack +version: '3.8' +services: + homeassistant: + container_name: homeassistant + image: ghcr.io/home-assistant/home-assistant:stable + volumes: + - ./config:/config + - /etc/localtime:/etc/localtime:ro + restart: unless-stopped + privileged: true + network_mode: host + + # Optional: Local database instead of SQLite + postgres: + container_name: ha-postgres + image: postgres:15 + environment: + POSTGRES_DB: homeassistant + POSTGRES_USER: homeassistant + POSTGRES_PASSWORD: your-secure-password + volumes: + - ./postgres-data:/var/lib/postgresql/data + restart: unless-stopped + ports: + - "5432:5432" + + # Optional: MQTT broker for device communication + mosquitto: + container_name: ha-mosquitto + image: eclipse-mosquitto:latest + restart: unless-stopped + ports: + - "1883:1883" + - "9001:9001" + volumes: + - ./mosquitto/config:/mosquitto/config + - ./mosquitto/data:/mosquitto/data + - ./mosquitto/log:/mosquitto/log + + # Optional: Node-RED for visual automation development + node-red: + container_name: ha-node-red + image: nodered/node-red:latest + restart: unless-stopped + ports: + - "1880:1880" + volumes: + - ./node-red-data:/data +``` + +## Migration Strategy + +### Phase 1: Parallel Setup (Recommended) +1. **Deploy HA alongside Apple Home** (don't disturb current setup) +2. **Start with 1-2 test devices** (re-pair a couple of sensors) +3. **Build basic automations** to validate functionality +4. **Test for 1-2 weeks** to ensure stability + +### Phase 2: Gradual Migration +1. **Re-pair devices in groups** (sensors first, then bulbs, etc.) +2. **Migrate automations one by one** +3. **Keep Apple Home as backup** until confident + +### Phase 3: Full Cutover +1. **Remove devices from Apple Home** +2. **Decommission Apple TV hub role** (keep as media device) +3. **Full automation implementation** + +## Device Integration Strategy + +### Matter Devices (Your Current Setup) +- **Direct Integration:** HA's Matter support is mature as of 2024 +- **No Hub Required:** HA can be the Matter controller +- **Thread Network:** Can share Thread network between platforms during migration + +### Philips Hue Bridge +- **Option 1:** Keep bridge, integrate via Hue integration (easier) +- **Option 2:** Direct Zigbee control (requires Zigbee coordinator) +- **Recommendation:** Keep bridge initially, migrate later if needed + +### Future Expansion Options +- **Zigbee:** Add Zigbee coordinator for non-Matter devices +- **Z-Wave:** Add Z-Wave stick for legacy devices +- **WiFi Devices:** Direct integration via HA's vast library +- **Custom Integrations:** HACS (Home Assistant Community Store) + +## Automation Examples You Can Build + +### Complex Scheduling +```yaml +# Advanced morning routine with conditions +automation: + - alias: "Smart Morning Routine" + trigger: + - platform: time + at: "06:30:00" + condition: + - condition: state + entity_id: binary_sensor.workday + state: 'on' + - condition: state + entity_id: person.your_name + state: 'home' + action: + - service: light.turn_on + target: + entity_id: light.bedroom_lights + data: + brightness_pct: 30 + color_temp: 400 + - delay: "00:15:00" + - service: light.turn_on + data: + brightness_pct: 80 +``` + +### Presence Detection +```yaml +# Multi-factor presence with phone + door sensors +automation: + - alias: "Arrival Detection" + trigger: + - platform: state + entity_id: person.your_name + to: 'home' + - platform: state + entity_id: binary_sensor.front_door + to: 'on' + condition: + - condition: state + entity_id: person.your_name + state: 'home' + action: + - service: scene.turn_on + target: + entity_id: scene.arrival_lights + - service: climate.set_temperature + target: + entity_id: climate.main_thermostat + data: + temperature: 72 +``` + +## Resource Requirements + +### Minimum Specs +- **CPU:** 2 cores +- **RAM:** 2GB +- **Storage:** 20GB +- **Network:** Gigabit recommended for media streaming integration + +### Your Proxmox Environment +- Should handle HA easily alongside existing containers +- Consider dedicating specific resources if running many integrations +- Network mode: host recommended for device discovery \ No newline at end of file diff --git a/.claude/plans/home-automation-platform-analysis.md b/.claude/plans/home-automation-platform-analysis.md new file mode 100644 index 0000000..9631944 --- /dev/null +++ b/.claude/plans/home-automation-platform-analysis.md @@ -0,0 +1,94 @@ +# Home Automation Platform Analysis + +## Current Setup Analysis +- **Strengths:** Matter compliance, local Apple TV hub, established device ecosystem (10+ sensors, bulbs, Hue bridge) +- **Limitations:** Automation complexity ceiling, no external integrations, limited scheduling options +- **Infrastructure:** Proxmox host, Docker/Podman capability, network-local preference + +## Platform Comparison + +### 1. Home Assistant (Recommended) +**Pros:** +- Exceptional automation engine with complex logic, scheduling, and templating +- Best-in-class local operation (no cloud required) +- Massive device integration library (2000+ integrations) +- Matter support has matured significantly since 2024 +- Strong community and documentation +- Supports both container and OS deployment + +**Cons:** +- Initial setup learning curve (but much improved) +- Device re-pairing required +- Ongoing configuration maintenance + +**Best For:** Users wanting maximum automation flexibility and local control + +### 2. Node-RED + Apple Home (Hybrid) +**Pros:** +- Keep existing device pairings intact +- Visual flow-based automation programming +- Can integrate with Apple Home via HomeKit Controller +- Good for complex orchestrations +- Local operation possible + +**Cons:** +- Limited by Apple Home's device exposure limitations +- Requires running alongside Apple Home ecosystem +- Less device integration options than HA +- May hit Apple Home API rate limits on complex automations + +**Best For:** Users wanting advanced automations without disrupting existing setup + +### 3. Home Assistant + Apple Home Bridge (Best of Both) +**Pros:** +- Keep Apple Home interface for family while gaining HA automation power +- All devices managed in HA, exposed to Apple Home selectively +- Maximum automation capabilities + familiar interface +- Local operation maintained + +**Cons:** +- Most complex initial setup +- Device re-pairing required +- Two systems to maintain + +## Architecture Recommendations + +### Option A: Full Home Assistant Migration +1. Deploy HA in Proxmox container/VM +2. Re-pair all devices to HA directly +3. Build automations in HA +4. Use HA mobile app + dashboards + +### Option B: HA + HomeKit Bridge Hybrid +1. Deploy HA in Proxmox container/VM +2. Re-pair devices to HA +3. Use HomeKit Controller integration to expose selected devices/automations back to Apple Home +4. Family continues using Apple Home interface, you get HA automation power + +### Option C: Node-RED Overlay (Conservative) +1. Deploy Node-RED in container +2. Keep all devices paired to Apple Home +3. Use Node-RED HomeKit Controller to read Apple Home state +4. Build complex automations in Node-RED +5. Control devices through Apple Home APIs + +## Technical Considerations + +### Deployment Method: Podman Container (Recommended) +- More flexible than Home Assistant OS +- Easier backup/restore +- Better resource control in Proxmox +- GPU passthrough compatibility you already have + +### Matter Device Strategy +- Modern Home Assistant has excellent Matter support +- Re-pairing is straightforward with Matter devices +- Thread network can be shared between platforms + +## Next Steps Recommendation +Start with **Option A (Full HA Migration)** because: +1. Your technical comfort level makes setup manageable +2. Device re-pairing is acceptable to you +3. Local-only operation is priority +4. You want maximum automation flexibility +5. Container deployment aligns with your infrastructure \ No newline at end of file diff --git a/.claude/plans/voice-automation-architecture.md b/.claude/plans/voice-automation-architecture.md new file mode 100644 index 0000000..a11b119 --- /dev/null +++ b/.claude/plans/voice-automation-architecture.md @@ -0,0 +1,200 @@ +# Voice-Controlled Automation Architecture + +## Vision: Speech → Claude Code → Home Assistant Pipeline + +### High-Level Flow +``` +[Microphone] → [STT Engine] → [Command Parser] → [Claude Code API] → [Home Assistant] → [Actions] +``` + +## Component Architecture + +### 1. Speech-to-Text (STT) Engine - Local Options +**Whisper (OpenAI) - Recommended** +- Excellent accuracy, runs locally +- Multiple model sizes (tiny to large) +- GPU acceleration available +- Container deployment: `openai/whisper` + +**Alternative: Vosk** +- Lighter weight, faster response +- Good for command recognition +- Multiple language models available + +### 2. Voice Activity Detection (VAD) +**Wake Word Detection** +- Porcupine (Picovoice) - local wake word detection +- Custom wake phrases: "Hey Claude", "Computer", etc. +- Always-listening with privacy protection + +**Push-to-Talk Alternative** +- Hardware button integration +- Mobile app trigger +- Keyboard shortcut + +### 3. Command Processing Pipeline +**Natural Language Parser** +- Claude Code interprets spoken commands +- Converts to Home Assistant service calls +- Handles context and ambiguity + +**Command Categories:** +- Direct device control: "Turn off living room lights" +- Scene activation: "Set movie mode" +- Status queries: "What's the temperature upstairs?" +- Complex automations: "Start my morning routine" + +### 4. Claude Code Integration +**API Bridge Service** +- Local service accepting STT output +- Formats requests to Claude Code API +- Maintains conversation context +- Returns structured HA commands + +**Command Translation Examples:** +``` +Speech: "Turn down the bedroom lights" +Claude: Interprets as light.turn_on service call +HA Command: {"service": "light.turn_on", "target": "light.bedroom", "brightness_pct": 30} +``` + +### 5. Home Assistant Integration +**RESTful API Integration** +- Direct API calls to HA instance +- WebSocket connection for real-time updates +- Authentication via long-lived access tokens + +**Voice Response Integration** +- HA TTS service for confirmations +- Status announcements +- Error handling feedback + +## Deployment Architecture + +### Container Stack Addition +```yaml +# Add to existing HA docker-compose.yml + + # STT Service + whisper-api: + container_name: ha-whisper + image: onerahmet/openai-whisper-asr-webservice:latest + ports: + - "9000:9000" + environment: + - ASR_MODEL=base # or small, medium, large + volumes: + - ./whisper-models:/root/.cache/whisper + deploy: + resources: + reservations: + devices: + - driver: nvidia # Optional GPU acceleration + count: 1 + capabilities: [gpu] + + # Voice Processing Bridge + voice-bridge: + container_name: ha-voice-bridge + build: ./voice-bridge # Custom service + ports: + - "8080:8080" + environment: + - CLAUDE_API_KEY=${CLAUDE_API_KEY} + - HA_URL=http://homeassistant:8123 + - HA_TOKEN=${HA_TOKEN} + - WHISPER_URL=http://whisper-api:9000 + volumes: + - ./voice-bridge-config:/config + depends_on: + - homeassistant + - whisper-api +``` + +### Hardware Requirements +**Microphone Setup:** +- USB microphone or audio interface +- Raspberry Pi with mic for remote rooms +- Existing smart speakers (if hackable) + +**Processing Power:** +- Whisper base model: ~1GB RAM, CPU sufficient +- Whisper large model: ~2GB RAM, GPU recommended +- Your Proxmox setup can easily handle this + +## Privacy & Security Considerations + +### Local-First Design +- All STT processing on local hardware +- No cloud APIs for voice recognition +- Claude Code API calls only for command interpretation +- HA commands never leave local network + +### Security Architecture +``` +Internet ← [Firewall] ← [Claude API calls only] ← [Voice Bridge] ← [Local STT] ← [Microphone] + ↓ + [Home Assistant] ← [Local Network Only] +``` + +### Data Flow +1. **Audio capture** - stays local +2. **STT processing** - stays local +3. **Text command** - sent to Claude Code API (text only) +4. **HA commands** - executed locally +5. **No audio data** ever leaves your network + +## Implementation Phases + +### Phase 1: Core STT Integration +- Deploy Whisper container +- Basic speech-to-text testing +- Integration with HA via simple commands + +### Phase 2: Claude Code Bridge +- Build voice-bridge service +- Integrate Claude Code API for command interpretation +- Basic natural language processing + +### Phase 3: Advanced Features +- Wake word detection +- Multi-room microphone setup +- Context-aware conversations +- Voice response integration + +### Phase 4: Optimization +- GPU acceleration for STT +- Custom wake words +- Conversation memory +- Advanced natural language understanding + +## Example Use Cases + +### Simple Commands +- "Turn off all lights" +- "Set temperature to 72 degrees" +- "Activate movie scene" + +### Complex Requests +- "Turn on the lights in rooms where people are detected" +- "Start my bedtime routine in 10 minutes" +- "If it's going to rain tomorrow, close the garage door" + +### Status Queries +- "What's the status of the security system?" +- "Are all the doors locked?" +- "Show me energy usage this month" + +## Integration with Existing Plans + +This voice system would layer on top of your planned HA deployment: +- **No changes** to core HA architecture +- **Additional containers** for voice processing +- **API integration** rather than HA core modifications +- **Gradual rollout** after HA migration is stable + +The voice system becomes another automation trigger alongside: +- Time-based automations +- Sensor-based automations +- Manual app/dashboard controls +- **Voice commands via Claude Code** \ No newline at end of file diff --git a/.claude/plans/voice-automation-implementation-details.md b/.claude/plans/voice-automation-implementation-details.md new file mode 100644 index 0000000..a80a255 --- /dev/null +++ b/.claude/plans/voice-automation-implementation-details.md @@ -0,0 +1,1182 @@ +# Voice Automation Implementation Details & Insights + +## Comprehensive Technical Analysis + +This document captures detailed implementation insights, technical considerations, and lessons learned from analyzing voice-controlled home automation architecture for integration with Home Assistant and Claude Code. + +## Core Architecture Deep Dive + +### Speech-to-Text Engine Comparison + +#### OpenAI Whisper (Primary Recommendation) +**Technical Specifications:** +- **Models Available:** tiny (39MB), base (74MB), small (244MB), medium (769MB), large (1550MB) +- **Languages:** 99+ languages with varying accuracy levels +- **Accuracy:** State-of-the-art, especially for English +- **Latency:** + - Tiny: ~100ms on CPU, ~50ms on GPU + - Base: ~200ms on CPU, ~100ms on GPU + - Large: ~1s on CPU, ~300ms on GPU +- **Resource Usage:** + - CPU: 1-4 cores depending on model size + - RAM: 1-4GB depending on model size + - GPU: Optional but significant speedup (2-10x faster) + +**Container Options:** +```bash +# Official Whisper in container +docker run -p 9000:9000 onerahmet/openai-whisper-asr-webservice:latest + +# Custom optimized version with GPU +docker run --gpus all -p 9000:9000 whisper-gpu:latest +``` + +**API Interface:** +```python +# RESTful API example +import requests +import json + +response = requests.post('http://localhost:9000/asr', + files={'audio_file': open('command.wav', 'rb')}, + data={'task': 'transcribe', 'language': 'english'} +) +text = response.json()['text'] +``` + +#### Vosk Alternative Analysis +**Pros:** +- Smaller memory footprint (100-200MB models) +- Faster real-time processing +- Better for streaming audio +- Multiple model sizes per language + +**Cons:** +- Lower accuracy than Whisper for natural speech +- Fewer supported languages +- Less robust with accents/noise + +**Use Case:** Better for command-word recognition, worse for natural language + +#### wav2vec2 Consideration +**Facebook's Model:** +- Excellent accuracy competitive with Whisper +- More complex setup and deployment +- Less containerized ecosystem +- **Recommendation:** Skip unless specific requirements + +### Voice Activity Detection (VAD) Deep Dive + +#### Wake Word Detection Systems + +**Porcupine by Picovoice (Recommended)** +```python +# Porcupine integration example +import pvporcupine + +porcupine = pvporcupine.create( + access_key='your-access-key', + keywords=['hey-claude', 'computer', 'assistant'] +) + +while True: + pcm = get_next_audio_frame() # 16kHz, 16-bit, mono + keyword_index = porcupine.process(pcm) + if keyword_index >= 0: + print(f"Wake word detected: {keywords[keyword_index]}") + # Trigger STT pipeline +``` + +**Technical Requirements:** +- Continuous audio monitoring +- Low CPU usage (< 1% on modern CPUs) +- Custom wake word training available +- Privacy: all processing local + +**Alternative: Snowboy (Open Source)** +- No longer actively maintained +- Still functional for basic wake words +- Completely free and local +- Lower accuracy than Porcupine + +#### Push-to-Talk Implementations + +**Hardware Button Integration:** +```python +# GPIO button on Raspberry Pi +import RPi.GPIO as GPIO + +BUTTON_PIN = 18 +GPIO.setup(BUTTON_PIN, GPIO.IN, pull_up_down=GPIO.PUD_UP) + +def button_callback(channel): + if GPIO.input(channel) == GPIO.LOW: + start_recording() + else: + stop_recording_and_process() + +GPIO.add_event_detect(BUTTON_PIN, GPIO.BOTH, callback=button_callback) +``` + +**Mobile App Integration:** +- Home Assistant mobile app can trigger automations +- Custom webhook endpoints +- WebSocket connections for real-time triggers + +### Claude Code Integration Architecture + +#### API Bridge Service Implementation + +**Service Architecture:** +```python +# voice-bridge service structure +from fastapi import FastAPI +from anthropic import Anthropic +import homeassistant_api +import whisper_client +import asyncio + +app = FastAPI() + +class VoiceBridge: + def __init__(self): + self.claude = Anthropic(api_key=os.getenv('CLAUDE_API_KEY')) + self.ha = homeassistant_api.Client( + url=os.getenv('HA_URL'), + token=os.getenv('HA_TOKEN') + ) + self.whisper = whisper_client.Client(os.getenv('WHISPER_URL')) + + async def process_audio(self, audio_data): + # Step 1: Convert audio to text + transcript = await self.whisper.transcribe(audio_data) + + # Step 2: Send to Claude for interpretation + ha_context = await self.get_ha_context() + claude_response = await self.claude.messages.create( + model="claude-3-5-sonnet-20241022", + messages=[{ + "role": "user", + "content": f""" + Convert this voice command to Home Assistant API calls: + Command: "{transcript}" + + Available entities: {ha_context} + + Return JSON format for HA API calls. + """ + }] + ) + + # Step 3: Execute HA commands + commands = json.loads(claude_response.content) + results = [] + for cmd in commands: + result = await self.ha.call_service(**cmd) + results.append(result) + + return { + 'transcript': transcript, + 'commands': commands, + 'results': results + } + + async def get_ha_context(self): + # Get current state of all entities + states = await self.ha.get_states() + return { + 'lights': [e for e in states if e['entity_id'].startswith('light.')], + 'sensors': [e for e in states if e['entity_id'].startswith('sensor.')], + 'switches': [e for e in states if e['entity_id'].startswith('switch.')], + # ... other entity types + } +``` + +#### Command Translation Patterns + +**Direct Device Commands:** +```json +{ + "speech": "Turn on the living room lights", + "claude_interpretation": { + "intent": "light_control", + "target": "light.living_room", + "action": "turn_on" + }, + "ha_api_call": { + "service": "light.turn_on", + "target": {"entity_id": "light.living_room"} + } +} +``` + +**Scene Activation:** +```json +{ + "speech": "Set movie mode", + "claude_interpretation": { + "intent": "scene_activation", + "scene": "movie_mode" + }, + "ha_api_call": { + "service": "scene.turn_on", + "target": {"entity_id": "scene.movie_mode"} + } +} +``` + +**Complex Logic:** +```json +{ + "speech": "Turn on lights in occupied rooms", + "claude_interpretation": { + "intent": "conditional_light_control", + "condition": "occupancy_detected", + "action": "turn_on_lights" + }, + "ha_api_calls": [ + { + "service": "light.turn_on", + "target": {"entity_id": "light.bedroom"}, + "condition": "binary_sensor.bedroom_occupancy == 'on'" + }, + { + "service": "light.turn_on", + "target": {"entity_id": "light.living_room"}, + "condition": "binary_sensor.living_room_occupancy == 'on'" + } + ] +} +``` + +#### Context Management Strategy + +**Conversation Memory:** +```python +class ConversationContext: + def __init__(self): + self.history = [] + self.context_window = 10 # Keep last 10 interactions + + def add_interaction(self, speech, response, timestamp): + self.history.append({ + 'speech': speech, + 'response': response, + 'timestamp': timestamp, + 'ha_state_snapshot': self.capture_ha_state() + }) + + # Maintain sliding window + if len(self.history) > self.context_window: + self.history.pop(0) + + def get_context_for_claude(self): + return { + 'recent_commands': self.history[-3:], + 'current_time': datetime.now(), + 'house_state': self.get_current_house_state() + } +``` + +**Ambiguity Resolution:** +```python +# Handle ambiguous commands +def resolve_ambiguity(transcript, available_entities): + if "lights" in transcript.lower() and not specific_room_mentioned: + return { + 'type': 'clarification_needed', + 'message': 'Which lights? I can control: living room, bedroom, kitchen', + 'options': ['light.living_room', 'light.bedroom', 'light.kitchen'] + } +``` + +### Home Assistant Integration Patterns + +#### API Authentication & Security +```python +# Secure API setup +HA_CONFIG = { + 'url': 'http://homeassistant:8123', + 'token': os.getenv('HA_LONG_LIVED_TOKEN'), # Never hardcode + 'ssl_verify': True, # In production + 'timeout': 10 +} + +# Create long-lived access token in HA: +# Settings -> People -> [Your User] -> Long-lived access tokens +``` + +#### WebSocket Integration for Real-time Updates +```python +import websockets +import json + +async def ha_websocket_listener(): + uri = "ws://homeassistant:8123/api/websocket" + + async with websockets.connect(uri) as websocket: + # Authenticate + await websocket.send(json.dumps({ + 'type': 'auth', + 'access_token': HA_TOKEN + })) + + # Subscribe to state changes + await websocket.send(json.dumps({ + 'id': 1, + 'type': 'subscribe_events', + 'event_type': 'state_changed' + })) + + async for message in websocket: + data = json.loads(message) + if data.get('type') == 'event': + # Process state changes for voice responses + await process_state_change(data['event']) +``` + +#### Voice Response Integration +```python +# HA TTS integration for voice feedback +async def speak_response(message, entity_id='media_player.living_room'): + await ha_client.call_service( + 'tts', 'speak', + target={'entity_id': entity_id}, + service_data={ + 'message': message, + 'language': 'en', + 'options': {'voice': 'neural'} + } + ) + +# Usage examples: +await speak_response("Living room lights turned on") +await speak_response("I couldn't find that device. Please be more specific.") +await speak_response("Movie mode activated. Enjoy your film!") +``` + +### Hardware & Deployment Considerations + +#### Microphone Hardware Analysis + +**USB Microphones (Recommended for testing):** +- Blue Yeti: Excellent quality, multiple pickup patterns +- Audio-Technica ATR2100x-USB: Professional quality +- Samson Go Mic: Compact, budget-friendly + +**Professional Audio Interfaces:** +- Focusrite Scarlett Solo: Single input, professional quality +- Behringer U-Phoria UM2: Budget 2-input interface +- PreSonus AudioBox USB 96: Mid-range option + +**Raspberry Pi Integration:** +```bash +# ReSpeaker HAT for Raspberry Pi +# Provides 2-4 microphone array with hardware VAD +# I2S connection, low latency +# Built-in LED ring for visual feedback + +# GPIO microphone setup +sudo apt install python3-pyaudio +# Configure ALSA for USB microphones +``` + +**Network Microphone Distribution:** +```python +# Distributed microphone system +MICROPHONE_NODES = { + 'living_room': 'http://pi-living:8080', + 'bedroom': 'http://pi-bedroom:8080', + 'kitchen': 'http://pi-kitchen:8080' +} + +# Each Pi runs lightweight audio capture service +# Sends audio to central Whisper processing +``` + +#### GPU Acceleration Setup + +**NVIDIA GPU Configuration:** +```yaml +# Docker Compose GPU configuration +whisper-gpu: + image: whisper-gpu:latest + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] + environment: + - NVIDIA_VISIBLE_DEVICES=all +``` + +**Performance Benchmarks (estimated):** +- **CPU Only (8-core):** + - Whisper base: ~500ms latency + - Whisper large: ~2000ms latency +- **With GPU (GTX 1660+):** + - Whisper base: ~150ms latency + - Whisper large: ~400ms latency + +#### Container Orchestration Strategy + +**Complete Docker Compose Stack:** +```yaml +version: '3.8' + +services: + # Core Home Assistant + homeassistant: + container_name: homeassistant + image: ghcr.io/home-assistant/home-assistant:stable + volumes: + - ./ha-config:/config + - /etc/localtime:/etc/localtime:ro + restart: unless-stopped + network_mode: host + + # Speech-to-Text Engine + whisper: + container_name: whisper-stt + image: onerahmet/openai-whisper-asr-webservice:latest + ports: + - "9000:9000" + environment: + - ASR_MODEL=base + - ASR_ENGINE=openai_whisper + volumes: + - whisper-models:/root/.cache/whisper + restart: unless-stopped + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] + + # Voice Processing Bridge + voice-bridge: + container_name: voice-bridge + build: + context: ./voice-bridge + dockerfile: Dockerfile + ports: + - "8080:8080" + environment: + - CLAUDE_API_KEY=${CLAUDE_API_KEY} + - HA_URL=http://homeassistant:8123 + - HA_TOKEN=${HA_LONG_LIVED_TOKEN} + - WHISPER_URL=http://whisper:9000 + - PORCUPINE_ACCESS_KEY=${PORCUPINE_ACCESS_KEY} + volumes: + - ./voice-bridge-config:/app/config + - /dev/snd:/dev/snd # Audio device access + depends_on: + - homeassistant + - whisper + restart: unless-stopped + privileged: true # For audio device access + + # Optional: MQTT for device communication + mosquitto: + container_name: mqtt-broker + image: eclipse-mosquitto:latest + ports: + - "1883:1883" + - "9001:9001" + volumes: + - ./mosquitto:/mosquitto + restart: unless-stopped + + # Optional: Node-RED for visual automation + node-red: + container_name: node-red + image: nodered/node-red:latest + ports: + - "1880:1880" + volumes: + - node-red-data:/data + restart: unless-stopped + +volumes: + whisper-models: + node-red-data: +``` + +### Privacy & Security Deep Analysis + +#### Data Flow Security Model + +**Audio Data Privacy:** +``` +[Microphone] → [Local VAD] → [Local STT] → [Text Only] → [Claude API] + ↓ ↓ ↓ ↓ + Never leaves Never leaves Never leaves Encrypted + local net local net local net HTTPS only +``` + +**Security Boundaries:** +1. **Audio Capture Layer:** Hardware → Local processing only +2. **Speech Recognition:** Local Whisper → No cloud STT +3. **Command Interpretation:** Text-only to Claude Code API +4. **Automation Execution:** Local Home Assistant only + +#### Network Security Configuration + +**Firewall Rules:** +```bash +# Only allow outbound HTTPS for Claude API +iptables -A OUTPUT -p tcp --dport 443 -d anthropic.com -j ACCEPT +iptables -A OUTPUT -p tcp --dport 443 -j DROP # Block other HTTPS + +# Block all other outbound traffic from voice containers +iptables -A OUTPUT -s voice-bridge-ip -j DROP +``` + +**API Key Security:** +```bash +# Environment variable best practices +echo "CLAUDE_API_KEY=your-key-here" >> .env +echo "HA_LONG_LIVED_TOKEN=your-token-here" >> .env +chmod 600 .env + +# Container secrets mounting +docker run --env-file .env voice-bridge:latest +``` + +#### Privacy Controls Implementation + +**Audio Retention Policy:** +```python +class AudioPrivacyManager: + def __init__(self): + self.max_retention_seconds = 5 # Keep audio only during processing + self.transcript_retention_days = 7 # Keep transcripts short-term + + async def process_audio(self, audio_data): + try: + transcript = await self.stt_engine.transcribe(audio_data) + # Process immediately + result = await self.process_command(transcript) + + # Store transcript with expiration + await self.store_transcript(transcript, expires_in=7*24*3600) + + return result + finally: + # Always delete audio data immediately + del audio_data + gc.collect() +``` + +**User Consent & Controls:** +```python +# Voice system controls in Home Assistant +VOICE_CONTROLS = { + 'input_boolean.voice_system_enabled': 'Global voice control toggle', + 'input_boolean.voice_learning_mode': 'Allow transcript storage for improvement', + 'input_select.voice_privacy_level': ['minimal', 'standard', 'enhanced'], + 'button.clear_voice_history': 'Clear all stored transcripts' +} +``` + +### Advanced Features & Future Expansion + +#### Multi-Room Microphone Network + +**Distributed Audio Architecture:** +```python +# Central coordinator service +class MultiRoomVoiceCoordinator: + def __init__(self): + self.microphone_nodes = { + 'living_room': MicrophoneNode('192.168.1.101'), + 'bedroom': MicrophoneNode('192.168.1.102'), + 'kitchen': MicrophoneNode('192.168.1.103') + } + + async def listen_all_rooms(self): + # Simultaneous listening across all nodes + tasks = [node.listen() for node in self.microphone_nodes.values()] + winner = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED) + + # Process audio from first responding room + audio_data, source_room = winner.result() + return await self.process_with_context(audio_data, source_room) + + async def process_with_context(self, audio_data, room): + # Add room context to Claude processing + transcript = await self.stt.transcribe(audio_data) + + claude_prompt = f""" + Voice command from {room}: "{transcript}" + + Room-specific devices available: + {self.get_room_devices(room)} + + Convert to Home Assistant API calls. + """ +``` + +**Room-Aware Processing:** +```python +def get_room_devices(self, room): + """Return devices specific to the source room""" + room_entities = { + 'living_room': [ + 'light.living_room_ceiling', + 'media_player.living_room_tv', + 'climate.living_room_thermostat' + ], + 'bedroom': [ + 'light.bedroom_bedside', + 'switch.bedroom_fan', + 'binary_sensor.bedroom_window' + ] + } + return room_entities.get(room, []) +``` + +#### Context-Aware Conversations + +**Advanced Context Management:** +```python +class AdvancedContextManager: + def __init__(self): + self.conversation_sessions = {} + self.house_state_history = [] + + def create_claude_context(self, user_id, transcript): + session = self.get_or_create_session(user_id) + + context = { + 'transcript': transcript, + 'conversation_history': session.history[-5:], + 'current_time': datetime.now().isoformat(), + 'house_state': { + 'lights_on': self.get_lights_status(), + 'occupancy': self.get_occupancy_status(), + 'weather': self.get_weather(), + 'recent_events': self.get_recent_ha_events(minutes=15) + }, + 'user_preferences': self.get_user_preferences(user_id), + 'location_context': self.get_location_context() + } + + return self.format_claude_prompt(context) + + def format_claude_prompt(self, context): + return f""" + You are controlling a Home Assistant smart home system via voice commands. + + Current situation: + - Time: {context['current_time']} + - House state: {context['house_state']} + - Recent conversation: {context['conversation_history']} + + User said: "{context['transcript']}" + + Convert this to Home Assistant API calls. Consider: + 1. Current device states (don't turn on lights that are already on) + 2. Time of day (different responses for morning vs night) + 3. Recent conversation context + 4. User's typical preferences + + Respond with JSON array of Home Assistant service calls. + """ +``` + +#### Voice Response & Feedback Systems + +**Advanced TTS Integration:** +```python +class VoiceResponseManager: + def __init__(self): + self.tts_engines = { + 'neural': 'tts.cloud_say', # High quality + 'local': 'tts.piper_say', # Local processing + 'espeak': 'tts.espeak_say' # Fallback + } + + async def respond_with_voice(self, message, room=None, urgency='normal'): + # Select appropriate TTS based on context + tts_engine = self.select_tts_engine(urgency) + + # Select speakers based on room or system state + speakers = self.select_speakers(room) + + # Format message for natural speech + speech_message = self.format_for_speech(message) + + # Send to appropriate speakers + for speaker in speakers: + await self.ha_client.call_service( + 'tts', tts_engine, + target={'entity_id': speaker}, + service_data={ + 'message': speech_message, + 'options': { + 'voice': 'neural2-en-us-standard-a', + 'speed': 1.0, + 'pitch': 0.0 + } + } + ) + + def format_for_speech(self, message): + """Convert technical responses to natural speech""" + replacements = { + 'light.living_room': 'living room lights', + 'switch.bedroom_fan': 'bedroom fan', + 'climate.main_thermostat': 'thermostat', + 'scene.movie_mode': 'movie mode' + } + + for tech_term, natural_term in replacements.items(): + message = message.replace(tech_term, natural_term) + + return message +``` + +**Visual Feedback Integration:** +```python +# LED ring feedback on microphone nodes +class MicrophoneVisualFeedback: + def __init__(self, led_pin_count=12): + self.leds = neopixel.NeoPixel(board.D18, led_pin_count) + + def show_listening(self): + # Blue pulsing pattern + self.animate_pulse(color=(0, 0, 255)) + + def show_processing(self): + # Spinning orange pattern + self.animate_spin(color=(255, 165, 0)) + + def show_success(self): + # Green flash + self.flash(color=(0, 255, 0), duration=1.0) + + def show_error(self): + # Red flash + self.flash(color=(255, 0, 0), duration=2.0) +``` + +### Performance Optimization Strategies + +#### Caching & Response Time Optimization + +**STT Model Caching:** +```python +class WhisperModelCache: + def __init__(self): + self.models = {} + self.model_locks = {} + + async def get_model(self, model_size='base'): + if model_size not in self.models: + if model_size not in self.model_locks: + self.model_locks[model_size] = asyncio.Lock() + + async with self.model_locks[model_size]: + if model_size not in self.models: + self.models[model_size] = whisper.load_model(model_size) + + return self.models[model_size] +``` + +**Command Pattern Caching:** +```python +class CommandCache: + def __init__(self, ttl_seconds=300): # 5 minute TTL + self.cache = {} + self.ttl = ttl_seconds + + def get_cached_response(self, transcript_hash): + if transcript_hash in self.cache: + entry = self.cache[transcript_hash] + if time.time() - entry['timestamp'] < self.ttl: + return entry['response'] + return None + + def cache_response(self, transcript_hash, response): + self.cache[transcript_hash] = { + 'response': response, + 'timestamp': time.time() + } +``` + +#### Resource Management + +**Memory Management:** +```python +class ResourceManager: + def __init__(self): + self.max_memory_usage = 4 * 1024**3 # 4GB limit + + async def process_with_memory_management(self, audio_data): + initial_memory = self.get_memory_usage() + + try: + if initial_memory > self.max_memory_usage * 0.8: + await self.cleanup_memory() + + result = await self.process_audio(audio_data) + return result + + finally: + # Force garbage collection after each request + gc.collect() + + async def cleanup_memory(self): + # Clear caches, unload unused models + self.command_cache.clear() + self.whisper_cache.clear_unused() + gc.collect() +``` + +### Error Handling & Reliability + +#### Comprehensive Error Recovery + +**STT Failure Handling:** +```python +class RobustSTTProcessor: + def __init__(self): + self.stt_engines = [ + WhisperSTT(model='base'), + VoskSTT(), + DeepSpeechSTT() # Fallback options + ] + + async def transcribe_with_fallbacks(self, audio_data): + for i, engine in enumerate(self.stt_engines): + try: + transcript = await engine.transcribe(audio_data) + if self.validate_transcript(transcript): + return transcript + except Exception as e: + logger.warning(f"STT engine {i} failed: {e}") + if i == len(self.stt_engines) - 1: + raise + continue + + def validate_transcript(self, transcript): + # Basic validation rules + if len(transcript.strip()) < 3: + return False + if transcript.count('?') > len(transcript) / 4: # Too much uncertainty + return False + return True +``` + +**Claude API Failure Handling:** +```python +class RobustClaudeClient: + def __init__(self): + self.client = anthropic.Anthropic() + self.fallback_patterns = self.load_fallback_patterns() + + async def process_command_with_fallback(self, transcript): + try: + # Attempt Claude processing + response = await self.client.messages.create( + model="claude-3-5-sonnet-20241022", + messages=[{"role": "user", "content": self.create_prompt(transcript)}], + timeout=10.0 + ) + return json.loads(response.content) + + except (anthropic.APIError, asyncio.TimeoutError) as e: + logger.warning(f"Claude API failed: {e}") + + # Attempt local pattern matching as fallback + return self.fallback_command_processing(transcript) + + def fallback_command_processing(self, transcript): + """Simple pattern matching for basic commands when Claude is unavailable""" + transcript = transcript.lower() + + # Basic light controls + if 'turn on' in transcript and 'light' in transcript: + room = self.extract_room(transcript) + return [{ + 'service': 'light.turn_on', + 'target': {'entity_id': f'light.{room or "all"}'} + }] + + # Basic switch controls + if 'turn off' in transcript and ('light' in transcript or 'switch' in transcript): + room = self.extract_room(transcript) + return [{ + 'service': 'light.turn_off', + 'target': {'entity_id': f'light.{room or "all"}'} + }] + + # Scene activation + scenes = ['movie', 'bedtime', 'morning', 'evening'] + for scene in scenes: + if scene in transcript: + return [{ + 'service': 'scene.turn_on', + 'target': {'entity_id': f'scene.{scene}'} + }] + + # If no patterns match, return error + return [{'error': 'Command not recognized in offline mode'}] +``` + +#### Health Monitoring & Diagnostics + +**System Health Monitoring:** +```python +class VoiceSystemHealthMonitor: + def __init__(self): + self.health_checks = { + 'whisper_api': self.check_whisper_health, + 'claude_api': self.check_claude_health, + 'homeassistant_api': self.check_ha_health, + 'microphone_nodes': self.check_microphone_health + } + + async def run_health_checks(self): + results = {} + + for service, check_func in self.health_checks.items(): + try: + results[service] = await check_func() + except Exception as e: + results[service] = { + 'status': 'unhealthy', + 'error': str(e), + 'timestamp': datetime.now().isoformat() + } + + return results + + async def check_whisper_health(self): + start_time = time.time() + response = await aiohttp.get('http://whisper:9000/health') + latency = time.time() - start_time + + return { + 'status': 'healthy' if response.status == 200 else 'unhealthy', + 'latency_ms': int(latency * 1000), + 'timestamp': datetime.now().isoformat() + } + + # Similar checks for other services... +``` + +**Automated Recovery Actions:** +```python +class AutoRecoveryManager: + def __init__(self): + self.recovery_actions = { + 'whisper_unhealthy': self.restart_whisper_service, + 'high_memory_usage': self.cleanup_resources, + 'claude_rate_limited': self.enable_fallback_mode, + 'microphone_disconnected': self.reinitialize_audio + } + + async def handle_health_issue(self, issue_type, details): + if issue_type in self.recovery_actions: + logger.info(f"Attempting recovery for {issue_type}") + await self.recovery_actions[issue_type](details) + else: + logger.error(f"No recovery action for {issue_type}") + await self.alert_administrators(issue_type, details) +``` + +### Testing & Validation Strategies + +#### Audio Processing Testing + +**STT Accuracy Testing:** +```python +class STTAccuracyTester: + def __init__(self): + self.test_phrases = [ + "Turn on the living room lights", + "Set the temperature to 72 degrees", + "Activate movie mode", + "What's the weather like outside", + "Turn off all lights", + "Lock all doors" + ] + + async def run_accuracy_tests(self, stt_engine): + results = [] + + for phrase in self.test_phrases: + # Generate synthetic audio from phrase + audio_data = await self.text_to_speech(phrase) + + # Test STT accuracy + transcript = await stt_engine.transcribe(audio_data) + + accuracy = self.calculate_word_accuracy(phrase, transcript) + results.append({ + 'original': phrase, + 'transcript': transcript, + 'accuracy': accuracy + }) + + return results + + def calculate_word_accuracy(self, reference, hypothesis): + ref_words = reference.lower().split() + hyp_words = hypothesis.lower().split() + + # Simple word error rate calculation + correct = sum(1 for r, h in zip(ref_words, hyp_words) if r == h) + return correct / len(ref_words) if ref_words else 0 +``` + +#### End-to-End Integration Testing + +**Complete Pipeline Testing:** +```python +class E2ETestSuite: + def __init__(self): + self.test_scenarios = [ + { + 'name': 'basic_light_control', + 'audio_file': 'tests/audio/turn_on_lights.wav', + 'expected_ha_calls': [ + {'service': 'light.turn_on', 'target': {'entity_id': 'light.living_room'}} + ] + }, + { + 'name': 'complex_scene_activation', + 'audio_file': 'tests/audio/movie_mode.wav', + 'expected_ha_calls': [ + {'service': 'scene.turn_on', 'target': {'entity_id': 'scene.movie'}} + ] + } + ] + + async def run_full_pipeline_tests(self): + results = [] + + for scenario in self.test_scenarios: + result = await self.test_scenario(scenario) + results.append(result) + + return results + + async def test_scenario(self, scenario): + # Load test audio + with open(scenario['audio_file'], 'rb') as f: + audio_data = f.read() + + # Run through complete pipeline + try: + actual_calls = await self.voice_bridge.process_audio(audio_data) + + # Compare with expected results + match = self.compare_ha_calls( + scenario['expected_ha_calls'], + actual_calls['commands'] + ) + + return { + 'scenario': scenario['name'], + 'success': match, + 'expected': scenario['expected_ha_calls'], + 'actual': actual_calls['commands'] + } + + except Exception as e: + return { + 'scenario': scenario['name'], + 'success': False, + 'error': str(e) + } +``` + +## Implementation Timeline & Milestones + +### Phase 1: Foundation (Weeks 1-2) +**Goals:** +- Home Assistant stable deployment +- Basic container infrastructure +- Initial device integration + +**Success Criteria:** +- HA accessible and controlling existing devices +- Container stack running reliably +- Basic automations working + +### Phase 2: Core Voice System (Weeks 3-4) +**Goals:** +- Whisper STT deployment +- Basic voice-bridge service +- Simple command processing + +**Success Criteria:** +- Speech-to-text working with test audio files +- Claude Code API integration functional +- Basic "turn on lights" commands working + +### Phase 3: Production Features (Weeks 5-6) +**Goals:** +- Wake word detection +- Multi-room microphone support +- Advanced error handling + +**Success Criteria:** +- Hands-free operation with wake words +- Reliable operation across multiple rooms +- Graceful failure modes working + +### Phase 4: Optimization & Polish (Weeks 7-8) +**Goals:** +- Performance optimization +- Advanced context awareness +- Visual/audio feedback systems + +**Success Criteria:** +- Sub-500ms response times +- Context-aware conversations +- Family-friendly operation + +## Cost Analysis + +### Hardware Costs +- **Microphones:** $50-200 per room +- **Processing Hardware:** Covered by existing Proxmox setup +- **Additional Storage:** ~50GB for models and logs + +### Service Costs +- **Claude Code API:** ~$0.01-0.10 per command (depending on context size) +- **Porcupine Wake Words:** $0.50-2.00 per month per wake word +- **No cloud STT costs** (fully local) + +### Estimated Monthly Operating Costs +- **Light Usage (10 commands/day):** ~$3-10/month +- **Heavy Usage (50 commands/day):** ~$15-50/month +- **Wake word licensing:** ~$2-5/month + +## Conclusion & Next Steps + +This voice automation system represents a cutting-edge approach to local smart home control, combining the latest in speech recognition with advanced AI interpretation. The architecture prioritizes privacy, reliability, and extensibility while maintaining the local-only operation you desire. + +**Key Success Factors:** +1. **Proven Technology Stack:** Whisper + Claude Code + Home Assistant +2. **Privacy-First Design:** Audio never leaves local network +3. **Flexible Architecture:** Easy to extend and customize +4. **Reliable Fallbacks:** Multiple failure recovery mechanisms + +**Recommended Implementation Approach:** +1. Start with Home Assistant foundation +2. Add voice components incrementally +3. Test thoroughly at each phase +4. Optimize for your specific use patterns + +The combination of your technical expertise, existing infrastructure, and this comprehensive architecture plan sets you up for success in creating a truly advanced, private, and powerful voice-controlled smart home system. + +This system will provide the advanced automation capabilities that Apple Home lacks while maintaining the local control and privacy that drove your original Home Assistant interest. The addition of Claude Code as the natural language processing layer bridges the gap between human speech and technical automation in a way that would be extremely difficult to achieve with traditional rule-based systems. \ No newline at end of file