- Add Home Assistant deployment guide with container architecture - Document platform analysis comparing Home Assistant, OpenHAB, and Node-RED - Add voice automation architecture with local/cloud hybrid approach - Include implementation details for Rhasspy + Home Assistant integration - Provide step-by-step deployment guides and configuration templates - Document privacy-focused voice processing with local wake word detection 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
200 lines
5.5 KiB
Markdown
200 lines
5.5 KiB
Markdown
# Voice-Controlled Automation Architecture
|
|
|
|
## Vision: Speech → Claude Code → Home Assistant Pipeline
|
|
|
|
### High-Level Flow
|
|
```
|
|
[Microphone] → [STT Engine] → [Command Parser] → [Claude Code API] → [Home Assistant] → [Actions]
|
|
```
|
|
|
|
## Component Architecture
|
|
|
|
### 1. Speech-to-Text (STT) Engine - Local Options
|
|
**Whisper (OpenAI) - Recommended**
|
|
- Excellent accuracy, runs locally
|
|
- Multiple model sizes (tiny to large)
|
|
- GPU acceleration available
|
|
- Container deployment: `openai/whisper`
|
|
|
|
**Alternative: Vosk**
|
|
- Lighter weight, faster response
|
|
- Good for command recognition
|
|
- Multiple language models available
|
|
|
|
### 2. Voice Activity Detection (VAD)
|
|
**Wake Word Detection**
|
|
- Porcupine (Picovoice) - local wake word detection
|
|
- Custom wake phrases: "Hey Claude", "Computer", etc.
|
|
- Always-listening with privacy protection
|
|
|
|
**Push-to-Talk Alternative**
|
|
- Hardware button integration
|
|
- Mobile app trigger
|
|
- Keyboard shortcut
|
|
|
|
### 3. Command Processing Pipeline
|
|
**Natural Language Parser**
|
|
- Claude Code interprets spoken commands
|
|
- Converts to Home Assistant service calls
|
|
- Handles context and ambiguity
|
|
|
|
**Command Categories:**
|
|
- Direct device control: "Turn off living room lights"
|
|
- Scene activation: "Set movie mode"
|
|
- Status queries: "What's the temperature upstairs?"
|
|
- Complex automations: "Start my morning routine"
|
|
|
|
### 4. Claude Code Integration
|
|
**API Bridge Service**
|
|
- Local service accepting STT output
|
|
- Formats requests to Claude Code API
|
|
- Maintains conversation context
|
|
- Returns structured HA commands
|
|
|
|
**Command Translation Examples:**
|
|
```
|
|
Speech: "Turn down the bedroom lights"
|
|
Claude: Interprets as light.turn_on service call
|
|
HA Command: {"service": "light.turn_on", "target": "light.bedroom", "brightness_pct": 30}
|
|
```
|
|
|
|
### 5. Home Assistant Integration
|
|
**RESTful API Integration**
|
|
- Direct API calls to HA instance
|
|
- WebSocket connection for real-time updates
|
|
- Authentication via long-lived access tokens
|
|
|
|
**Voice Response Integration**
|
|
- HA TTS service for confirmations
|
|
- Status announcements
|
|
- Error handling feedback
|
|
|
|
## Deployment Architecture
|
|
|
|
### Container Stack Addition
|
|
```yaml
|
|
# Add to existing HA docker-compose.yml
|
|
|
|
# STT Service
|
|
whisper-api:
|
|
container_name: ha-whisper
|
|
image: onerahmet/openai-whisper-asr-webservice:latest
|
|
ports:
|
|
- "9000:9000"
|
|
environment:
|
|
- ASR_MODEL=base # or small, medium, large
|
|
volumes:
|
|
- ./whisper-models:/root/.cache/whisper
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
devices:
|
|
- driver: nvidia # Optional GPU acceleration
|
|
count: 1
|
|
capabilities: [gpu]
|
|
|
|
# Voice Processing Bridge
|
|
voice-bridge:
|
|
container_name: ha-voice-bridge
|
|
build: ./voice-bridge # Custom service
|
|
ports:
|
|
- "8080:8080"
|
|
environment:
|
|
- CLAUDE_API_KEY=${CLAUDE_API_KEY}
|
|
- HA_URL=http://homeassistant:8123
|
|
- HA_TOKEN=${HA_TOKEN}
|
|
- WHISPER_URL=http://whisper-api:9000
|
|
volumes:
|
|
- ./voice-bridge-config:/config
|
|
depends_on:
|
|
- homeassistant
|
|
- whisper-api
|
|
```
|
|
|
|
### Hardware Requirements
|
|
**Microphone Setup:**
|
|
- USB microphone or audio interface
|
|
- Raspberry Pi with mic for remote rooms
|
|
- Existing smart speakers (if hackable)
|
|
|
|
**Processing Power:**
|
|
- Whisper base model: ~1GB RAM, CPU sufficient
|
|
- Whisper large model: ~2GB RAM, GPU recommended
|
|
- Your Proxmox setup can easily handle this
|
|
|
|
## Privacy & Security Considerations
|
|
|
|
### Local-First Design
|
|
- All STT processing on local hardware
|
|
- No cloud APIs for voice recognition
|
|
- Claude Code API calls only for command interpretation
|
|
- HA commands never leave local network
|
|
|
|
### Security Architecture
|
|
```
|
|
Internet ← [Firewall] ← [Claude API calls only] ← [Voice Bridge] ← [Local STT] ← [Microphone]
|
|
↓
|
|
[Home Assistant] ← [Local Network Only]
|
|
```
|
|
|
|
### Data Flow
|
|
1. **Audio capture** - stays local
|
|
2. **STT processing** - stays local
|
|
3. **Text command** - sent to Claude Code API (text only)
|
|
4. **HA commands** - executed locally
|
|
5. **No audio data** ever leaves your network
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Core STT Integration
|
|
- Deploy Whisper container
|
|
- Basic speech-to-text testing
|
|
- Integration with HA via simple commands
|
|
|
|
### Phase 2: Claude Code Bridge
|
|
- Build voice-bridge service
|
|
- Integrate Claude Code API for command interpretation
|
|
- Basic natural language processing
|
|
|
|
### Phase 3: Advanced Features
|
|
- Wake word detection
|
|
- Multi-room microphone setup
|
|
- Context-aware conversations
|
|
- Voice response integration
|
|
|
|
### Phase 4: Optimization
|
|
- GPU acceleration for STT
|
|
- Custom wake words
|
|
- Conversation memory
|
|
- Advanced natural language understanding
|
|
|
|
## Example Use Cases
|
|
|
|
### Simple Commands
|
|
- "Turn off all lights"
|
|
- "Set temperature to 72 degrees"
|
|
- "Activate movie scene"
|
|
|
|
### Complex Requests
|
|
- "Turn on the lights in rooms where people are detected"
|
|
- "Start my bedtime routine in 10 minutes"
|
|
- "If it's going to rain tomorrow, close the garage door"
|
|
|
|
### Status Queries
|
|
- "What's the status of the security system?"
|
|
- "Are all the doors locked?"
|
|
- "Show me energy usage this month"
|
|
|
|
## Integration with Existing Plans
|
|
|
|
This voice system would layer on top of your planned HA deployment:
|
|
- **No changes** to core HA architecture
|
|
- **Additional containers** for voice processing
|
|
- **API integration** rather than HA core modifications
|
|
- **Gradual rollout** after HA migration is stable
|
|
|
|
The voice system becomes another automation trigger alongside:
|
|
- Time-based automations
|
|
- Sensor-based automations
|
|
- Manual app/dashboard controls
|
|
- **Voice commands via Claude Code** |