claude-home/.claude/plans/voice-automation-architecture.md
Cal Corum bd49e9d61d CLAUDE: Add comprehensive home automation planning documents
- Add Home Assistant deployment guide with container architecture
- Document platform analysis comparing Home Assistant, OpenHAB, and Node-RED
- Add voice automation architecture with local/cloud hybrid approach
- Include implementation details for Rhasspy + Home Assistant integration
- Provide step-by-step deployment guides and configuration templates
- Document privacy-focused voice processing with local wake word detection

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 16:21:28 -05:00

200 lines
5.5 KiB
Markdown

# Voice-Controlled Automation Architecture
## Vision: Speech → Claude Code → Home Assistant Pipeline
### High-Level Flow
```
[Microphone] → [STT Engine] → [Command Parser] → [Claude Code API] → [Home Assistant] → [Actions]
```
## Component Architecture
### 1. Speech-to-Text (STT) Engine - Local Options
**Whisper (OpenAI) - Recommended**
- Excellent accuracy, runs locally
- Multiple model sizes (tiny to large)
- GPU acceleration available
- Container deployment: `openai/whisper`
**Alternative: Vosk**
- Lighter weight, faster response
- Good for command recognition
- Multiple language models available
### 2. Voice Activity Detection (VAD)
**Wake Word Detection**
- Porcupine (Picovoice) - local wake word detection
- Custom wake phrases: "Hey Claude", "Computer", etc.
- Always-listening with privacy protection
**Push-to-Talk Alternative**
- Hardware button integration
- Mobile app trigger
- Keyboard shortcut
### 3. Command Processing Pipeline
**Natural Language Parser**
- Claude Code interprets spoken commands
- Converts to Home Assistant service calls
- Handles context and ambiguity
**Command Categories:**
- Direct device control: "Turn off living room lights"
- Scene activation: "Set movie mode"
- Status queries: "What's the temperature upstairs?"
- Complex automations: "Start my morning routine"
### 4. Claude Code Integration
**API Bridge Service**
- Local service accepting STT output
- Formats requests to Claude Code API
- Maintains conversation context
- Returns structured HA commands
**Command Translation Examples:**
```
Speech: "Turn down the bedroom lights"
Claude: Interprets as light.turn_on service call
HA Command: {"service": "light.turn_on", "target": "light.bedroom", "brightness_pct": 30}
```
### 5. Home Assistant Integration
**RESTful API Integration**
- Direct API calls to HA instance
- WebSocket connection for real-time updates
- Authentication via long-lived access tokens
**Voice Response Integration**
- HA TTS service for confirmations
- Status announcements
- Error handling feedback
## Deployment Architecture
### Container Stack Addition
```yaml
# Add to existing HA docker-compose.yml
# STT Service
whisper-api:
container_name: ha-whisper
image: onerahmet/openai-whisper-asr-webservice:latest
ports:
- "9000:9000"
environment:
- ASR_MODEL=base # or small, medium, large
volumes:
- ./whisper-models:/root/.cache/whisper
deploy:
resources:
reservations:
devices:
- driver: nvidia # Optional GPU acceleration
count: 1
capabilities: [gpu]
# Voice Processing Bridge
voice-bridge:
container_name: ha-voice-bridge
build: ./voice-bridge # Custom service
ports:
- "8080:8080"
environment:
- CLAUDE_API_KEY=${CLAUDE_API_KEY}
- HA_URL=http://homeassistant:8123
- HA_TOKEN=${HA_TOKEN}
- WHISPER_URL=http://whisper-api:9000
volumes:
- ./voice-bridge-config:/config
depends_on:
- homeassistant
- whisper-api
```
### Hardware Requirements
**Microphone Setup:**
- USB microphone or audio interface
- Raspberry Pi with mic for remote rooms
- Existing smart speakers (if hackable)
**Processing Power:**
- Whisper base model: ~1GB RAM, CPU sufficient
- Whisper large model: ~2GB RAM, GPU recommended
- Your Proxmox setup can easily handle this
## Privacy & Security Considerations
### Local-First Design
- All STT processing on local hardware
- No cloud APIs for voice recognition
- Claude Code API calls only for command interpretation
- HA commands never leave local network
### Security Architecture
```
Internet ← [Firewall] ← [Claude API calls only] ← [Voice Bridge] ← [Local STT] ← [Microphone]
[Home Assistant] ← [Local Network Only]
```
### Data Flow
1. **Audio capture** - stays local
2. **STT processing** - stays local
3. **Text command** - sent to Claude Code API (text only)
4. **HA commands** - executed locally
5. **No audio data** ever leaves your network
## Implementation Phases
### Phase 1: Core STT Integration
- Deploy Whisper container
- Basic speech-to-text testing
- Integration with HA via simple commands
### Phase 2: Claude Code Bridge
- Build voice-bridge service
- Integrate Claude Code API for command interpretation
- Basic natural language processing
### Phase 3: Advanced Features
- Wake word detection
- Multi-room microphone setup
- Context-aware conversations
- Voice response integration
### Phase 4: Optimization
- GPU acceleration for STT
- Custom wake words
- Conversation memory
- Advanced natural language understanding
## Example Use Cases
### Simple Commands
- "Turn off all lights"
- "Set temperature to 72 degrees"
- "Activate movie scene"
### Complex Requests
- "Turn on the lights in rooms where people are detected"
- "Start my bedtime routine in 10 minutes"
- "If it's going to rain tomorrow, close the garage door"
### Status Queries
- "What's the status of the security system?"
- "Are all the doors locked?"
- "Show me energy usage this month"
## Integration with Existing Plans
This voice system would layer on top of your planned HA deployment:
- **No changes** to core HA architecture
- **Additional containers** for voice processing
- **API integration** rather than HA core modifications
- **Gradual rollout** after HA migration is stable
The voice system becomes another automation trigger alongside:
- Time-based automations
- Sensor-based automations
- Manual app/dashboard controls
- **Voice commands via Claude Code**