- Add recovered LXC 300/302 server-diagnostics configs as reference (headless Claude permission patterns, health check client) - Archive decommissioned tdarr monitoring scripts - Gitignore rpg-art/ directory - Delete stray temp files and swarm-test/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
178 lines
4.9 KiB
Markdown
178 lines
4.9 KiB
Markdown
---
|
|
name: server-diagnostics
|
|
description: |
|
|
Automated server troubleshooting for Docker containers and system health.
|
|
Provides SSH-based diagnostics, log reading, metrics collection, and low-risk
|
|
remediation. USE WHEN N8N triggers troubleshooting, container issues detected,
|
|
or system health checks needed.
|
|
---
|
|
|
|
# Server Diagnostics - Automated Troubleshooting
|
|
|
|
## When to Activate This Skill
|
|
- N8N triggers with error context
|
|
- "diagnose container X", "check docker status"
|
|
- "read logs from server", "check disk usage"
|
|
- "troubleshoot server issue"
|
|
- Any automated health check response
|
|
|
|
## Quick Start
|
|
|
|
### Check All Containers
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py docker-status paper-dynasty
|
|
```
|
|
|
|
### Quick Health Check (Docker + System Metrics)
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py health paper-dynasty
|
|
```
|
|
|
|
### Get Container Logs
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py docker-logs paper-dynasty paper-dynasty_discord-app_1 --lines 200
|
|
```
|
|
|
|
### Restart a Container
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py docker-restart paper-dynasty paper-dynasty_discord-app_1
|
|
```
|
|
|
|
### System Metrics
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py metrics paper-dynasty --type all
|
|
python ~/.claude/skills/server-diagnostics/client.py metrics paper-dynasty --type disk
|
|
```
|
|
|
|
### Run Diagnostic Command
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py diagnostic paper-dynasty disk_usage
|
|
python ~/.claude/skills/server-diagnostics/client.py diagnostic paper-dynasty memory_usage
|
|
```
|
|
|
|
## Troubleshooting Workflow
|
|
|
|
When an issue is reported:
|
|
|
|
1. **Quick Health Check** - Get overview of containers and system state
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py health paper-dynasty
|
|
```
|
|
|
|
2. **Check MemoryGraph** - Recall similar issues
|
|
```bash
|
|
python ~/.claude/skills/memorygraph/client.py recall "docker container error"
|
|
```
|
|
|
|
3. **Get Container Logs** - Look for errors
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py docker-logs paper-dynasty <container> --lines 500 --filter error
|
|
```
|
|
|
|
4. **Remediate if Safe** - Restart if allowed
|
|
```bash
|
|
python ~/.claude/skills/server-diagnostics/client.py docker-restart paper-dynasty <container>
|
|
```
|
|
|
|
5. **Store Solution** - Save to MemoryGraph if resolved
|
|
```bash
|
|
python ~/.claude/skills/memorygraph/client.py store \
|
|
--type solution \
|
|
--title "Fixed <container> issue" \
|
|
--content "Description of problem and solution" \
|
|
--tags "docker,paper-dynasty,troubleshooting" \
|
|
--importance 0.7
|
|
```
|
|
|
|
## Server Inventory
|
|
|
|
| Server | IP | SSH User | Description |
|
|
|--------|-----|----------|-------------|
|
|
| paper-dynasty | 10.10.0.88 | cal | Paper Dynasty Discord bots and services |
|
|
|
|
## Monitored Containers
|
|
|
|
| Container | Critical | Restart Allowed | Description |
|
|
|-----------|----------|-----------------|-------------|
|
|
| paper-dynasty_discord-app_1 | Yes | Yes | Paper Dynasty Discord bot |
|
|
| paper-dynasty_db_1 | Yes | Yes | PostgreSQL database |
|
|
| paper-dynasty_adminer_1 | No | Yes | Database admin UI |
|
|
| sba-website_sba-web_1 | Yes | Yes | SBA website |
|
|
| sba-ghost_sba-ghost_1 | No | Yes | Ghost CMS |
|
|
|
|
## Available Diagnostic Commands
|
|
|
|
- `disk_usage` - df -h
|
|
- `memory_usage` - free -h
|
|
- `cpu_usage` - top -bn1 | head -20
|
|
- `cpu_load` - uptime
|
|
- `process_list` - ps aux --sort=-%mem | head -20
|
|
- `network_status` - ss -tuln
|
|
- `docker_ps` - docker ps -a (formatted)
|
|
- `docker_stats` - docker stats --no-stream
|
|
- `journal_errors` - journalctl -p err -n 50
|
|
|
|
## Security Constraints
|
|
|
|
### DENIED Patterns (Will Be Rejected)
|
|
- rm -rf, rm -r /
|
|
- dd if=, mkfs
|
|
- shutdown, reboot
|
|
- systemctl stop
|
|
- chmod 777
|
|
- wget|sh, curl|sh
|
|
|
|
### Container Restart Rules
|
|
- Only containers in config.yaml with restart_allowed: true
|
|
- N8N container restart is NEVER allowed (it triggers us)
|
|
|
|
## MemoryGraph Integration
|
|
|
|
Before troubleshooting, check for known solutions:
|
|
```bash
|
|
python ~/.claude/skills/memorygraph/client.py recall "docker paper-dynasty"
|
|
```
|
|
|
|
After resolving, store the pattern:
|
|
```bash
|
|
python ~/.claude/skills/memorygraph/client.py store \
|
|
--type solution \
|
|
--title "Brief description" \
|
|
--content "Full explanation..." \
|
|
--tags "docker,paper-dynasty,fix" \
|
|
--importance 0.7
|
|
```
|
|
|
|
## Common Issues and Solutions
|
|
|
|
### Container Not Running
|
|
1. Check logs for crash reason
|
|
2. Check disk space and memory
|
|
3. Attempt restart if allowed
|
|
4. Escalate if restart fails
|
|
|
|
### High Memory Usage
|
|
1. Check which container is consuming
|
|
2. Review docker stats
|
|
3. Check for memory leaks in logs
|
|
4. Consider container restart
|
|
|
|
### Disk Space Low
|
|
1. Run disk_usage diagnostic
|
|
2. Check docker system df
|
|
3. Consider log rotation
|
|
4. Alert user for cleanup
|
|
|
|
## Output Format
|
|
|
|
All commands return JSON:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"stdout": "...",
|
|
"stderr": "...",
|
|
"returncode": 0,
|
|
"data": {...} // Parsed data if applicable
|
|
}
|
|
```
|