- Created complete gaming detection and priority system - Added gaming schedule configuration and enforcement - Implemented Steam library monitoring with auto-detection - Built comprehensive game process detection for multiple platforms - Added gaming-aware Tdarr worker management with priority controls - Created emergency gaming mode for immediate worker shutdown - Integrated Discord notifications for gaming state changes - Replaced old bash monitoring with enhanced Python monitoring system - Added persistent state management and memory tracking - Implemented configurable gaming time windows and schedules - Updated .gitignore to exclude logs directories 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
677 lines
20 KiB
Markdown
677 lines
20 KiB
Markdown
# Prometheus + Grafana Home Lab Monitoring Setup
|
|
|
|
## Overview
|
|
|
|
This document provides a complete setup for monitoring a Proxmox home lab with 8 Ubuntu Server VMs running Docker applications. The solution uses the Prometheus + Grafana + Alertmanager stack to provide comprehensive monitoring with custom metrics, alerting, and visualization.
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
- **Prometheus**: Time-series database for metrics collection (pull-based)
|
|
- **Grafana**: Web-based visualization and dashboards
|
|
- **Alertmanager**: Alert routing and notifications
|
|
- **Node Exporter**: System metrics (CPU, memory, disk, network)
|
|
- **cAdvisor**: Docker container metrics
|
|
- **Custom Exporters**: Application-specific metrics (transcodes, web server stats, etc.)
|
|
|
|
### Deployment Strategy
|
|
- **Main Monitoring VM**: Runs Prometheus, Grafana, and Alertmanager
|
|
- **Each Monitored VM**: Runs Node Exporter, cAdvisor, and any custom exporters
|
|
- **Proxmox Host**: Runs Proxmox VE Exporter for hypervisor metrics
|
|
|
|
## Main Monitoring Stack Deployment
|
|
|
|
### Directory Structure
|
|
```
|
|
monitoring/
|
|
├── docker-compose.yml
|
|
├── prometheus/
|
|
│ ├── prometheus.yml
|
|
│ └── alerts.yml
|
|
├── alertmanager/
|
|
│ └── alertmanager.yml
|
|
├── grafana/
|
|
│ └── provisioning/
|
|
│ ├── datasources/
|
|
│ │ └── prometheus.yml
|
|
│ └── dashboards/
|
|
│ └── dashboard.yml
|
|
└── data/
|
|
├── prometheus/
|
|
├── grafana/
|
|
└── alertmanager/
|
|
```
|
|
|
|
### Docker Compose Configuration
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
|
|
networks:
|
|
monitoring:
|
|
driver: bridge
|
|
|
|
volumes:
|
|
prometheus_data:
|
|
grafana_data:
|
|
alertmanager_data:
|
|
|
|
services:
|
|
prometheus:
|
|
image: prom/prometheus:latest
|
|
container_name: prometheus
|
|
restart: unless-stopped
|
|
ports:
|
|
- "9090:9090"
|
|
command:
|
|
- '--config.file=/etc/prometheus/prometheus.yml'
|
|
- '--storage.tsdb.path=/prometheus'
|
|
- '--web.console.libraries=/etc/prometheus/console_libraries'
|
|
- '--web.console.templates=/etc/prometheus/consoles'
|
|
- '--storage.tsdb.retention.time=200h'
|
|
- '--web.enable-lifecycle'
|
|
volumes:
|
|
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
|
|
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml
|
|
- prometheus_data:/prometheus
|
|
networks:
|
|
- monitoring
|
|
|
|
grafana:
|
|
image: grafana/grafana:latest
|
|
container_name: grafana
|
|
restart: unless-stopped
|
|
ports:
|
|
- "3000:3000"
|
|
environment:
|
|
- GF_SECURITY_ADMIN_USER=admin
|
|
- GF_SECURITY_ADMIN_PASSWORD=admin123
|
|
- GF_USERS_ALLOW_SIGN_UP=false
|
|
volumes:
|
|
- grafana_data:/var/lib/grafana
|
|
- ./grafana/provisioning:/etc/grafana/provisioning
|
|
networks:
|
|
- monitoring
|
|
|
|
alertmanager:
|
|
image: prom/alertmanager:latest
|
|
container_name: alertmanager
|
|
restart: unless-stopped
|
|
ports:
|
|
- "9093:9093"
|
|
volumes:
|
|
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
|
|
- alertmanager_data:/alertmanager
|
|
command:
|
|
- '--config.file=/etc/alertmanager/alertmanager.yml'
|
|
- '--storage.path=/alertmanager'
|
|
- '--web.external-url=http://localhost:9093'
|
|
networks:
|
|
- monitoring
|
|
```
|
|
|
|
## Configuration Files
|
|
|
|
### Prometheus Configuration (prometheus/prometheus.yml)
|
|
|
|
```yaml
|
|
global:
|
|
scrape_interval: 15s
|
|
evaluation_interval: 15s
|
|
|
|
rule_files:
|
|
- "alerts.yml"
|
|
|
|
alerting:
|
|
alertmanagers:
|
|
- static_configs:
|
|
- targets:
|
|
- alertmanager:9093
|
|
|
|
scrape_configs:
|
|
# Prometheus itself
|
|
- job_name: 'prometheus'
|
|
static_configs:
|
|
- targets: ['localhost:9090']
|
|
|
|
# Node Exporters - Update with your VM IPs
|
|
- job_name: 'node-exporter'
|
|
static_configs:
|
|
- targets:
|
|
- '192.168.1.101:9100' # VM1
|
|
- '192.168.1.102:9100' # VM2
|
|
- '192.168.1.103:9100' # VM3
|
|
- '192.168.1.104:9100' # VM4
|
|
- '192.168.1.105:9100' # VM5
|
|
- '192.168.1.106:9100' # VM6
|
|
- '192.168.1.107:9100' # VM7
|
|
- '192.168.1.108:9100' # VM8
|
|
|
|
# cAdvisor for Docker metrics
|
|
- job_name: 'cadvisor'
|
|
static_configs:
|
|
- targets:
|
|
- '192.168.1.101:8080' # VM1
|
|
- '192.168.1.102:8080' # VM2
|
|
- '192.168.1.103:8080' # VM3
|
|
- '192.168.1.104:8080' # VM4
|
|
- '192.168.1.105:8080' # VM5
|
|
- '192.168.1.106:8080' # VM6
|
|
- '192.168.1.107:8080' # VM7
|
|
- '192.168.1.108:8080' # VM8
|
|
|
|
# Proxmox VE Exporter - Update with your Proxmox host IP
|
|
- job_name: 'proxmox'
|
|
static_configs:
|
|
- targets: ['192.168.1.100:9221'] # Proxmox host
|
|
|
|
# Custom application exporters
|
|
- job_name: 'custom-apps'
|
|
static_configs:
|
|
- targets:
|
|
- '192.168.1.101:9999' # Custom app metrics
|
|
- '192.168.1.102:9999' # Media server metrics
|
|
|
|
# Home Assistant Prometheus integration
|
|
- job_name: 'homeassistant'
|
|
scrape_interval: 30s
|
|
metrics_path: /api/prometheus
|
|
bearer_token: 'YOUR_LONG_LIVED_ACCESS_TOKEN' # Generate in HA Profile settings
|
|
static_configs:
|
|
- targets: ['192.168.1.XXX:8123'] # Your Home Assistant IP
|
|
|
|
# Home Assistant API exporter (alternative)
|
|
- job_name: 'homeassistant-api'
|
|
static_configs:
|
|
- targets: ['192.168.1.XXX:9998'] # Custom HA exporter
|
|
|
|
# HomeKit via MQTT bridge (if using)
|
|
- job_name: 'homekit'
|
|
static_configs:
|
|
- targets: ['192.168.1.XXX:9997'] # Custom HomeKit exporter
|
|
```
|
|
|
|
### Alert Rules (prometheus/alerts.yml)
|
|
|
|
```yaml
|
|
groups:
|
|
- name: system-alerts
|
|
rules:
|
|
- alert: InstanceDown
|
|
expr: up == 0
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Instance {{ $labels.instance }} down"
|
|
description: "{{ $labels.instance }} has been down for more than 1 minute."
|
|
|
|
- alert: HighCPUUsage
|
|
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80
|
|
for: 2m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High CPU usage on {{ $labels.instance }}"
|
|
description: "CPU usage is above 80% for more than 2 minutes."
|
|
|
|
- alert: HighMemoryUsage
|
|
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80
|
|
for: 2m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High memory usage on {{ $labels.instance }}"
|
|
description: "Memory usage is above 80% for more than 2 minutes."
|
|
|
|
- alert: DiskSpaceLow
|
|
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 20
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "Low disk space on {{ $labels.instance }}"
|
|
description: "Disk space is below 20% on root filesystem."
|
|
|
|
- name: docker-alerts
|
|
rules:
|
|
- alert: ContainerDown
|
|
expr: absent(container_last_seen) or time() - container_last_seen > 60
|
|
for: 1m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Container {{ $labels.name }} is down"
|
|
description: "Container has been down for more than 1 minute."
|
|
```
|
|
|
|
### Alertmanager Configuration (alertmanager/alertmanager.yml)
|
|
|
|
```yaml
|
|
global:
|
|
smtp_smarthost: 'localhost:587'
|
|
smtp_from: 'alertmanager@yourdomain.com'
|
|
# Configure with your email settings
|
|
|
|
route:
|
|
group_by: ['alertname']
|
|
group_wait: 10s
|
|
group_interval: 10s
|
|
repeat_interval: 1h
|
|
receiver: 'web.hook'
|
|
routes:
|
|
- match:
|
|
severity: critical
|
|
receiver: 'critical-alerts'
|
|
- match:
|
|
severity: warning
|
|
receiver: 'warning-alerts'
|
|
|
|
receivers:
|
|
- name: 'web.hook'
|
|
webhook_configs:
|
|
- url: 'http://127.0.0.1:5001/'
|
|
|
|
- name: 'critical-alerts'
|
|
email_configs:
|
|
- to: 'admin@yourdomain.com'
|
|
subject: 'CRITICAL: {{ .GroupLabels.alertname }}'
|
|
body: |
|
|
{{ range .Alerts }}
|
|
Alert: {{ .Annotations.summary }}
|
|
Description: {{ .Annotations.description }}
|
|
{{ end }}
|
|
# Uncomment and configure for Discord/Slack webhooks
|
|
# webhook_configs:
|
|
# - url: 'YOUR_DISCORD_WEBHOOK_URL'
|
|
|
|
- name: 'warning-alerts'
|
|
email_configs:
|
|
- to: 'admin@yourdomain.com'
|
|
subject: 'WARNING: {{ .GroupLabels.alertname }}'
|
|
body: |
|
|
{{ range .Alerts }}
|
|
Alert: {{ .Annotations.summary }}
|
|
Description: {{ .Annotations.description }}
|
|
{{ end }}
|
|
|
|
inhibit_rules:
|
|
- source_match:
|
|
severity: 'critical'
|
|
target_match:
|
|
severity: 'warning'
|
|
equal: ['alertname', 'dev', 'instance']
|
|
```
|
|
|
|
### Grafana Datasource Provisioning (grafana/provisioning/datasources/prometheus.yml)
|
|
|
|
```yaml
|
|
apiVersion: 1
|
|
|
|
datasources:
|
|
- name: Prometheus
|
|
type: prometheus
|
|
access: proxy
|
|
url: http://prometheus:9090
|
|
isDefault: true
|
|
```
|
|
|
|
## VM Configuration
|
|
|
|
### Node Exporter and cAdvisor Deployment
|
|
|
|
Deploy this on each VM:
|
|
|
|
```yaml
|
|
# docker-compose.yml for each VM
|
|
version: '3.8'
|
|
|
|
services:
|
|
node-exporter:
|
|
image: prom/node-exporter:latest
|
|
container_name: node-exporter
|
|
restart: unless-stopped
|
|
ports:
|
|
- "9100:9100"
|
|
command:
|
|
- '--path.procfs=/host/proc'
|
|
- '--path.rootfs=/rootfs'
|
|
- '--path.sysfs=/host/sys'
|
|
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
|
|
volumes:
|
|
- /proc:/host/proc:ro
|
|
- /sys:/host/sys:ro
|
|
- /:/rootfs:ro
|
|
|
|
cadvisor:
|
|
image: gcr.io/cadvisor/cadvisor:latest
|
|
container_name: cadvisor
|
|
restart: unless-stopped
|
|
ports:
|
|
- "8080:8080"
|
|
volumes:
|
|
- /:/rootfs:ro
|
|
- /var/run:/var/run:ro
|
|
- /sys:/sys:ro
|
|
- /var/lib/docker/:/var/lib/docker:ro
|
|
- /dev/disk/:/dev/disk:ro
|
|
devices:
|
|
- /dev/kmsg
|
|
```
|
|
|
|
### Custom Metrics Example
|
|
|
|
For media server transcode monitoring:
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
# custom_exporter.py
|
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
|
import subprocess
|
|
import json
|
|
|
|
class MetricsHandler(BaseHTTPRequestHandler):
|
|
def do_GET(self):
|
|
if self.path == '/metrics':
|
|
metrics = self.get_metrics()
|
|
self.send_response(200)
|
|
self.send_header('Content-type', 'text/plain')
|
|
self.end_headers()
|
|
self.wfile.write(metrics.encode())
|
|
else:
|
|
self.send_response(404)
|
|
self.end_headers()
|
|
|
|
def get_metrics(self):
|
|
metrics = []
|
|
|
|
# Example: Count completed transcodes from log file
|
|
try:
|
|
with open('/var/log/transcodes.log', 'r') as f:
|
|
completed = len([line for line in f if 'COMPLETED' in line])
|
|
failed = len([line for line in f if 'FAILED' in line])
|
|
|
|
metrics.append(f'transcodes_completed_total {completed}')
|
|
metrics.append(f'transcodes_failed_total {failed}')
|
|
except FileNotFoundError:
|
|
metrics.append('transcodes_completed_total 0')
|
|
metrics.append('transcodes_failed_total 0')
|
|
|
|
# Add more custom metrics as needed
|
|
return '\n'.join(metrics) + '\n'
|
|
|
|
if __name__ == '__main__':
|
|
server = HTTPServer(('0.0.0.0', 9999), MetricsHandler)
|
|
print("Custom metrics server running on port 9999")
|
|
server.serve_forever()
|
|
```
|
|
|
|
Deploy as a systemd service or Docker container.
|
|
|
|
### Home Assistant Metrics Exporter
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
# homeassistant_exporter.py
|
|
import requests
|
|
import time
|
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
|
import json
|
|
|
|
HA_URL = "http://192.168.1.XXX:8123" # Your HA IP
|
|
HA_TOKEN = "your-long-lived-access-token" # Generate in HA Profile
|
|
|
|
class HAMetricsHandler(BaseHTTPRequestHandler):
|
|
def do_GET(self):
|
|
if self.path == '/metrics':
|
|
metrics = self.get_ha_metrics()
|
|
self.send_response(200)
|
|
self.send_header('Content-type', 'text/plain')
|
|
self.end_headers()
|
|
self.wfile.write(metrics.encode())
|
|
else:
|
|
self.send_response(404)
|
|
self.end_headers()
|
|
|
|
def get_ha_metrics(self):
|
|
headers = {
|
|
"Authorization": f"Bearer {HA_TOKEN}",
|
|
"Content-Type": "application/json"
|
|
}
|
|
|
|
try:
|
|
response = requests.get(f"{HA_URL}/api/states", headers=headers, timeout=10)
|
|
entities = response.json()
|
|
except Exception as e:
|
|
return f"# Error fetching HA data: {e}\n"
|
|
|
|
metrics = []
|
|
metrics.append("# HELP homeassistant_entity_state Home Assistant entity states")
|
|
metrics.append("# TYPE homeassistant_entity_state gauge")
|
|
|
|
for entity in entities:
|
|
entity_id = entity['entity_id']
|
|
domain = entity_id.split('.')[0]
|
|
name = entity_id.replace('.', '_').replace('-', '_')
|
|
state = entity['state']
|
|
|
|
# Add labels for better organization
|
|
labels = f'domain="{domain}",entity_id="{entity_id}"'
|
|
|
|
try:
|
|
# Try to convert to numeric value
|
|
value = float(state)
|
|
metrics.append(f'homeassistant_entity_state{{{labels}}} {value}')
|
|
except (ValueError, TypeError):
|
|
# Handle boolean states
|
|
if state.lower() in ['on', 'true', 'open', 'home']:
|
|
metrics.append(f'homeassistant_entity_state{{{labels}}} 1')
|
|
elif state.lower() in ['off', 'false', 'closed', 'away']:
|
|
metrics.append(f'homeassistant_entity_state{{{labels}}} 0')
|
|
else:
|
|
# For text states, create info metric
|
|
info_labels = f'domain="{domain}",entity_id="{entity_id}",state="{state}"'
|
|
metrics.append(f'homeassistant_entity_info{{{info_labels}}} 1')
|
|
|
|
return '\n'.join(metrics) + '\n'
|
|
|
|
if __name__ == '__main__':
|
|
server = HTTPServer(('0.0.0.0', 9998), HAMetricsHandler)
|
|
print("Home Assistant metrics exporter running on port 9998")
|
|
server.serve_forever()
|
|
```
|
|
|
|
### HomeKit Bridge Exporter (Optional)
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
# homekit_exporter.py - Requires homekit2mqtt or similar bridge
|
|
import paho.mqtt.client as mqtt
|
|
import json
|
|
from http.server import HTTPServer, BaseHTTPRequestHandler
|
|
import threading
|
|
|
|
MQTT_BROKER = "192.168.1.XXX" # Your MQTT broker
|
|
MQTT_PORT = 1883
|
|
HOMEKIT_TOPIC = "homekit/#"
|
|
|
|
class HomeKitData:
|
|
def __init__(self):
|
|
self.devices = {}
|
|
self.client = mqtt.Client()
|
|
self.client.on_connect = self.on_connect
|
|
self.client.on_message = self.on_message
|
|
|
|
def on_connect(self, client, userdata, flags, rc):
|
|
print(f"Connected to MQTT broker with result code {rc}")
|
|
client.subscribe(HOMEKIT_TOPIC)
|
|
|
|
def on_message(self, client, userdata, msg):
|
|
try:
|
|
topic_parts = msg.topic.split('/')
|
|
device_id = topic_parts[1] if len(topic_parts) > 1 else "unknown"
|
|
characteristic = topic_parts[2] if len(topic_parts) > 2 else "state"
|
|
|
|
value = json.loads(msg.payload.decode())
|
|
|
|
if device_id not in self.devices:
|
|
self.devices[device_id] = {}
|
|
self.devices[device_id][characteristic] = value
|
|
except Exception as e:
|
|
print(f"Error processing MQTT message: {e}")
|
|
|
|
homekit_data = HomeKitData()
|
|
|
|
class HomeKitMetricsHandler(BaseHTTPRequestHandler):
|
|
def do_GET(self):
|
|
if self.path == '/metrics':
|
|
metrics = self.get_homekit_metrics()
|
|
self.send_response(200)
|
|
self.send_header('Content-type', 'text/plain')
|
|
self.end_headers()
|
|
self.wfile.write(metrics.encode())
|
|
else:
|
|
self.send_response(404)
|
|
self.end_headers()
|
|
|
|
def get_homekit_metrics(self):
|
|
metrics = []
|
|
metrics.append("# HELP homekit_device_state HomeKit device states")
|
|
metrics.append("# TYPE homekit_device_state gauge")
|
|
|
|
for device_id, characteristics in homekit_data.devices.items():
|
|
for char_name, value in characteristics.items():
|
|
labels = f'device_id="{device_id}",characteristic="{char_name}"'
|
|
try:
|
|
numeric_value = float(value)
|
|
metrics.append(f'homekit_device_state{{{labels}}} {numeric_value}')
|
|
except (ValueError, TypeError):
|
|
# Handle boolean values
|
|
if str(value).lower() in ['true', 'on']:
|
|
metrics.append(f'homekit_device_state{{{labels}}} 1')
|
|
elif str(value).lower() in ['false', 'off']:
|
|
metrics.append(f'homekit_device_state{{{labels}}} 0')
|
|
|
|
return '\n'.join(metrics) + '\n'
|
|
|
|
def start_mqtt_client():
|
|
homekit_data.client.connect(MQTT_BROKER, MQTT_PORT, 60)
|
|
homekit_data.client.loop_forever()
|
|
|
|
if __name__ == '__main__':
|
|
# Start MQTT client in background thread
|
|
mqtt_thread = threading.Thread(target=start_mqtt_client)
|
|
mqtt_thread.daemon = True
|
|
mqtt_thread.start()
|
|
|
|
# Start HTTP server
|
|
server = HTTPServer(('0.0.0.0', 9997), HomeKitMetricsHandler)
|
|
print("HomeKit metrics exporter running on port 9997")
|
|
server.serve_forever()
|
|
```
|
|
|
|
## Proxmox Host Configuration
|
|
|
|
Install Proxmox VE Exporter on your Proxmox host:
|
|
|
|
```bash
|
|
# On Proxmox host
|
|
wget https://github.com/prometheus-pve/prometheus-pve-exporter/releases/latest/download/prometheus-pve-exporter
|
|
chmod +x prometheus-pve-exporter
|
|
|
|
# Create config file
|
|
cat > pve.yml << EOF
|
|
default:
|
|
user: monitoring@pve
|
|
password: your-password
|
|
verify_ssl: false
|
|
EOF
|
|
|
|
# Run the exporter
|
|
./prometheus-pve-exporter --config.file=pve.yml
|
|
```
|
|
|
|
## Installation Steps
|
|
|
|
1. **Create monitoring VM** with adequate resources (4GB RAM, 20GB disk recommended)
|
|
|
|
2. **Set up main monitoring stack:**
|
|
```bash
|
|
mkdir -p monitoring/{prometheus,alertmanager,grafana/provisioning/{datasources,dashboards}}
|
|
cd monitoring
|
|
# Copy all configuration files from above
|
|
docker-compose up -d
|
|
```
|
|
|
|
3. **Deploy exporters on each VM:**
|
|
```bash
|
|
# On each VM
|
|
docker-compose -f node-cadvisor-compose.yml up -d
|
|
```
|
|
|
|
4. **Configure Proxmox exporter** on the host
|
|
|
|
5. **Set up Home Assistant integration:**
|
|
|
|
**Option A: Enable Prometheus in Home Assistant**
|
|
```yaml
|
|
# Add to Home Assistant configuration.yaml
|
|
prometheus:
|
|
namespace: homeassistant
|
|
filter:
|
|
include_domains:
|
|
- sensor
|
|
- binary_sensor
|
|
- switch
|
|
- light
|
|
- climate
|
|
- weather
|
|
```
|
|
|
|
**Option B: Deploy custom HA exporter**
|
|
```bash
|
|
# Create and run the Home Assistant exporter
|
|
python3 homeassistant_exporter.py
|
|
```
|
|
|
|
6. **Optional: Set up HomeKit integration**
|
|
```bash
|
|
# If using homekit2mqtt bridge
|
|
npm install -g homekit2mqtt
|
|
homekit2mqtt --mqtt-url mqtt://your-mqtt-broker
|
|
|
|
# Then run the HomeKit exporter
|
|
python3 homekit_exporter.py
|
|
```
|
|
|
|
7. **Access interfaces:**
|
|
- Grafana: http://monitoring-vm-ip:3000 (admin/admin123)
|
|
- Prometheus: http://monitoring-vm-ip:9090
|
|
- Alertmanager: http://monitoring-vm-ip:9093
|
|
|
|
8. **Import dashboards** in Grafana:
|
|
- Node Exporter Full (Dashboard ID: 1860)
|
|
- Docker and system monitoring (Dashboard ID: 893)
|
|
- Proxmox VE (Dashboard ID: 10347)
|
|
- Home Assistant (Dashboard ID: 11021)
|
|
|
|
9. **Generate Home Assistant Long-Lived Access Token:**
|
|
- Go to HA Profile → Long-Lived Access Tokens
|
|
- Create new token and update exporter configs
|
|
|
|
## Customization Notes
|
|
|
|
- Update all IP addresses in prometheus.yml to match your network
|
|
- Configure email settings in alertmanager.yml
|
|
- Adjust alert thresholds in alerts.yml based on your requirements
|
|
- Add custom exporters for specific application monitoring
|
|
- Set up proper authentication and SSL for production use
|
|
|
|
## Maintenance
|
|
|
|
- Monitor disk space usage for time-series data
|
|
- Regular backups of configuration files
|
|
- Update container images periodically
|
|
- Review and tune alert rules based on false positive rates
|
|
|
|
This setup provides comprehensive monitoring for your home lab with room for expansion as your infrastructure grows. |