claude-home/.claude/plans/voice-automation-architecture.md

# Voice-Controlled Automation Architecture

## Vision: Speech → Claude Code → Home Assistant Pipeline

### High-Level Flow
```
[Microphone] → [STT Engine] → [Command Parser] → [Claude Code API] → [Home Assistant] → [Actions]
```

## Component Architecture

### 1. Speech-to-Text (STT) Engine - Local Options
**Whisper (OpenAI) - Recommended**
- Excellent accuracy, runs locally
- Multiple model sizes (tiny to large)
- GPU acceleration available
- Container deployment: `openai/whisper`

**Alternative: Vosk**
- Lighter weight, faster response
- Good for command recognition
- Multiple language models available

### 2. Voice Activity Detection (VAD)
**Wake Word Detection**
- Porcupine (Picovoice) - local wake word detection
- Custom wake phrases: "Hey Claude", "Computer", etc.
- Always-listening with privacy protection

**Push-to-Talk Alternative**
- Hardware button integration
- Mobile app trigger
- Keyboard shortcut

### 3. Command Processing Pipeline
**Natural Language Parser**
- Claude Code interprets spoken commands
- Converts to Home Assistant service calls
- Handles context and ambiguity

**Command Categories:**
- Direct device control: "Turn off living room lights"
- Scene activation: "Set movie mode"
- Status queries: "What's the temperature upstairs?"
- Complex automations: "Start my morning routine"

### 4. Claude Code Integration
**API Bridge Service**
- Local service accepting STT output
- Formats requests to Claude Code API
- Maintains conversation context
- Returns structured HA commands

**Command Translation Examples:**
```
Speech: "Turn down the bedroom lights"
Claude: Interprets as light.turn_on service call
HA Command: {"service": "light.turn_on", "target": "light.bedroom", "brightness_pct": 30}
```

### 5. Home Assistant Integration
**RESTful API Integration**
- Direct API calls to HA instance
- WebSocket connection for real-time updates
- Authentication via long-lived access tokens

**Voice Response Integration**
- HA TTS service for confirmations
- Status announcements
- Error handling feedback

## Deployment Architecture

### Container Stack Addition
```yaml
# Add to existing HA docker-compose.yml

  # STT Service
  whisper-api:
    container_name: ha-whisper
    image: onerahmet/openai-whisper-asr-webservice:latest
    ports:
      - "9000:9000"
    environment:
      - ASR_MODEL=base  # or small, medium, large
    volumes:
      - ./whisper-models:/root/.cache/whisper
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia  # Optional GPU acceleration
              count: 1
              capabilities: [gpu]

  # Voice Processing Bridge
  voice-bridge:
    container_name: ha-voice-bridge
    build: ./voice-bridge  # Custom service
    ports:
      - "8080:8080"
    environment:
      - CLAUDE_API_KEY=${CLAUDE_API_KEY}
      - HA_URL=http://homeassistant:8123
      - HA_TOKEN=${HA_TOKEN}
      - WHISPER_URL=http://whisper-api:9000
    volumes:
      - ./voice-bridge-config:/config
    depends_on:
      - homeassistant
      - whisper-api
```

### Hardware Requirements
**Microphone Setup:**
- USB microphone or audio interface
- Raspberry Pi with mic for remote rooms
- Existing smart speakers (if hackable)

**Processing Power:**
- Whisper base model: ~1GB RAM, CPU sufficient
- Whisper large model: ~2GB RAM, GPU recommended
- Your Proxmox setup can easily handle this

## Privacy & Security Considerations

### Local-First Design
- All STT processing on local hardware
- No cloud APIs for voice recognition
- Claude Code API calls only for command interpretation
- HA commands never leave local network

### Security Architecture
```
Internet ← [Firewall] ← [Claude API calls only] ← [Voice Bridge] ← [Local STT] ← [Microphone]
                                                        ↓
                                              [Home Assistant] ← [Local Network Only]
```

### Data Flow
1. **Audio capture** - stays local
2. **STT processing** - stays local
3. **Text command** - sent to Claude Code API (text only)
4. **HA commands** - executed locally
5. **No audio data** ever leaves your network

## Implementation Phases

### Phase 1: Core STT Integration
- Deploy Whisper container
- Basic speech-to-text testing
- Integration with HA via simple commands

### Phase 2: Claude Code Bridge
- Build voice-bridge service
- Integrate Claude Code API for command interpretation
- Basic natural language processing

### Phase 3: Advanced Features
- Wake word detection
- Multi-room microphone setup
- Context-aware conversations
- Voice response integration

### Phase 4: Optimization
- GPU acceleration for STT
- Custom wake words
- Conversation memory
- Advanced natural language understanding

## Example Use Cases

### Simple Commands
- "Turn off all lights"
- "Set temperature to 72 degrees"
- "Activate movie scene"

### Complex Requests
- "Turn on the lights in rooms where people are detected"
- "Start my bedtime routine in 10 minutes"
- "If it's going to rain tomorrow, close the garage door"

### Status Queries
- "What's the status of the security system?"
- "Are all the doors locked?"
- "Show me energy usage this month"

## Integration with Existing Plans

This voice system would layer on top of your planned HA deployment:
- **No changes** to core HA architecture
- **Additional containers** for voice processing
- **API integration** rather than HA core modifications
- **Gradual rollout** after HA migration is stable

The voice system becomes another automation trigger alongside:
- Time-based automations
- Sensor-based automations
- Manual app/dashboard controls
- **Voice commands via Claude Code**