claude-home/.claude/plans/voice-automation-architecture.md
Cal Corum bd49e9d61d CLAUDE: Add comprehensive home automation planning documents
- Add Home Assistant deployment guide with container architecture
- Document platform analysis comparing Home Assistant, OpenHAB, and Node-RED
- Add voice automation architecture with local/cloud hybrid approach
- Include implementation details for Rhasspy + Home Assistant integration
- Provide step-by-step deployment guides and configuration templates
- Document privacy-focused voice processing with local wake word detection

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 16:21:28 -05:00

5.5 KiB

Voice-Controlled Automation Architecture

Vision: Speech → Claude Code → Home Assistant Pipeline

High-Level Flow

[Microphone] → [STT Engine] → [Command Parser] → [Claude Code API] → [Home Assistant] → [Actions]

Component Architecture

1. Speech-to-Text (STT) Engine - Local Options

Whisper (OpenAI) - Recommended

  • Excellent accuracy, runs locally
  • Multiple model sizes (tiny to large)
  • GPU acceleration available
  • Container deployment: openai/whisper

Alternative: Vosk

  • Lighter weight, faster response
  • Good for command recognition
  • Multiple language models available

2. Voice Activity Detection (VAD)

Wake Word Detection

  • Porcupine (Picovoice) - local wake word detection
  • Custom wake phrases: "Hey Claude", "Computer", etc.
  • Always-listening with privacy protection

Push-to-Talk Alternative

  • Hardware button integration
  • Mobile app trigger
  • Keyboard shortcut

3. Command Processing Pipeline

Natural Language Parser

  • Claude Code interprets spoken commands
  • Converts to Home Assistant service calls
  • Handles context and ambiguity

Command Categories:

  • Direct device control: "Turn off living room lights"
  • Scene activation: "Set movie mode"
  • Status queries: "What's the temperature upstairs?"
  • Complex automations: "Start my morning routine"

4. Claude Code Integration

API Bridge Service

  • Local service accepting STT output
  • Formats requests to Claude Code API
  • Maintains conversation context
  • Returns structured HA commands

Command Translation Examples:

Speech: "Turn down the bedroom lights"
Claude: Interprets as light.turn_on service call
HA Command: {"service": "light.turn_on", "target": "light.bedroom", "brightness_pct": 30}

5. Home Assistant Integration

RESTful API Integration

  • Direct API calls to HA instance
  • WebSocket connection for real-time updates
  • Authentication via long-lived access tokens

Voice Response Integration

  • HA TTS service for confirmations
  • Status announcements
  • Error handling feedback

Deployment Architecture

Container Stack Addition

# Add to existing HA docker-compose.yml

  # STT Service
  whisper-api:
    container_name: ha-whisper
    image: onerahmet/openai-whisper-asr-webservice:latest
    ports:
      - "9000:9000"
    environment:
      - ASR_MODEL=base  # or small, medium, large
    volumes:
      - ./whisper-models:/root/.cache/whisper
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia  # Optional GPU acceleration
              count: 1
              capabilities: [gpu]

  # Voice Processing Bridge
  voice-bridge:
    container_name: ha-voice-bridge
    build: ./voice-bridge  # Custom service
    ports:
      - "8080:8080"
    environment:
      - CLAUDE_API_KEY=${CLAUDE_API_KEY}
      - HA_URL=http://homeassistant:8123
      - HA_TOKEN=${HA_TOKEN}
      - WHISPER_URL=http://whisper-api:9000
    volumes:
      - ./voice-bridge-config:/config
    depends_on:
      - homeassistant
      - whisper-api

Hardware Requirements

Microphone Setup:

  • USB microphone or audio interface
  • Raspberry Pi with mic for remote rooms
  • Existing smart speakers (if hackable)

Processing Power:

  • Whisper base model: ~1GB RAM, CPU sufficient
  • Whisper large model: ~2GB RAM, GPU recommended
  • Your Proxmox setup can easily handle this

Privacy & Security Considerations

Local-First Design

  • All STT processing on local hardware
  • No cloud APIs for voice recognition
  • Claude Code API calls only for command interpretation
  • HA commands never leave local network

Security Architecture

Internet ← [Firewall] ← [Claude API calls only] ← [Voice Bridge] ← [Local STT] ← [Microphone]
                                                        ↓
                                              [Home Assistant] ← [Local Network Only]

Data Flow

  1. Audio capture - stays local
  2. STT processing - stays local
  3. Text command - sent to Claude Code API (text only)
  4. HA commands - executed locally
  5. No audio data ever leaves your network

Implementation Phases

Phase 1: Core STT Integration

  • Deploy Whisper container
  • Basic speech-to-text testing
  • Integration with HA via simple commands

Phase 2: Claude Code Bridge

  • Build voice-bridge service
  • Integrate Claude Code API for command interpretation
  • Basic natural language processing

Phase 3: Advanced Features

  • Wake word detection
  • Multi-room microphone setup
  • Context-aware conversations
  • Voice response integration

Phase 4: Optimization

  • GPU acceleration for STT
  • Custom wake words
  • Conversation memory
  • Advanced natural language understanding

Example Use Cases

Simple Commands

  • "Turn off all lights"
  • "Set temperature to 72 degrees"
  • "Activate movie scene"

Complex Requests

  • "Turn on the lights in rooms where people are detected"
  • "Start my bedtime routine in 10 minutes"
  • "If it's going to rain tomorrow, close the garage door"

Status Queries

  • "What's the status of the security system?"
  • "Are all the doors locked?"
  • "Show me energy usage this month"

Integration with Existing Plans

This voice system would layer on top of your planned HA deployment:

  • No changes to core HA architecture
  • Additional containers for voice processing
  • API integration rather than HA core modifications
  • Gradual rollout after HA migration is stable

The voice system becomes another automation trigger alongside:

  • Time-based automations
  • Sensor-based automations
  • Manual app/dashboard controls
  • Voice commands via Claude Code