chore: add recovered CT 302 configs, archive tdarr scripts, clean up repo
- Add recovered LXC 300/302 server-diagnostics configs as reference (headless Claude permission patterns, health check client) - Archive decommissioned tdarr monitoring scripts - Gitignore rpg-art/ directory - Delete stray temp files and swarm-test/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
64f9662f25
commit
28abde7c9f
3
.gitignore
vendored
3
.gitignore
vendored
@ -17,3 +17,6 @@ __pycache__
|
|||||||
|
|
||||||
# Large binary files
|
# Large binary files
|
||||||
*.zip
|
*.zip
|
||||||
|
|
||||||
|
# Art assets (managed separately)
|
||||||
|
rpg-art/
|
||||||
|
|||||||
177
monitoring/recovered-lxc300/server-diagnostics/SKILL.md
Normal file
177
monitoring/recovered-lxc300/server-diagnostics/SKILL.md
Normal file
@ -0,0 +1,177 @@
|
|||||||
|
---
|
||||||
|
name: server-diagnostics
|
||||||
|
description: |
|
||||||
|
Automated server troubleshooting for Docker containers and system health.
|
||||||
|
Provides SSH-based diagnostics, log reading, metrics collection, and low-risk
|
||||||
|
remediation. USE WHEN N8N triggers troubleshooting, container issues detected,
|
||||||
|
or system health checks needed.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Server Diagnostics - Automated Troubleshooting
|
||||||
|
|
||||||
|
## When to Activate This Skill
|
||||||
|
- N8N triggers with error context
|
||||||
|
- "diagnose container X", "check docker status"
|
||||||
|
- "read logs from server", "check disk usage"
|
||||||
|
- "troubleshoot server issue"
|
||||||
|
- Any automated health check response
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Check All Containers
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py docker-status paper-dynasty
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quick Health Check (Docker + System Metrics)
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py health paper-dynasty
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get Container Logs
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py docker-logs paper-dynasty paper-dynasty_discord-app_1 --lines 200
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restart a Container
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py docker-restart paper-dynasty paper-dynasty_discord-app_1
|
||||||
|
```
|
||||||
|
|
||||||
|
### System Metrics
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py metrics paper-dynasty --type all
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py metrics paper-dynasty --type disk
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run Diagnostic Command
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py diagnostic paper-dynasty disk_usage
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py diagnostic paper-dynasty memory_usage
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting Workflow
|
||||||
|
|
||||||
|
When an issue is reported:
|
||||||
|
|
||||||
|
1. **Quick Health Check** - Get overview of containers and system state
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py health paper-dynasty
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check MemoryGraph** - Recall similar issues
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/memorygraph/client.py recall "docker container error"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Get Container Logs** - Look for errors
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py docker-logs paper-dynasty <container> --lines 500 --filter error
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Remediate if Safe** - Restart if allowed
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/server-diagnostics/client.py docker-restart paper-dynasty <container>
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Store Solution** - Save to MemoryGraph if resolved
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/memorygraph/client.py store \
|
||||||
|
--type solution \
|
||||||
|
--title "Fixed <container> issue" \
|
||||||
|
--content "Description of problem and solution" \
|
||||||
|
--tags "docker,paper-dynasty,troubleshooting" \
|
||||||
|
--importance 0.7
|
||||||
|
```
|
||||||
|
|
||||||
|
## Server Inventory
|
||||||
|
|
||||||
|
| Server | IP | SSH User | Description |
|
||||||
|
|--------|-----|----------|-------------|
|
||||||
|
| paper-dynasty | 10.10.0.88 | cal | Paper Dynasty Discord bots and services |
|
||||||
|
|
||||||
|
## Monitored Containers
|
||||||
|
|
||||||
|
| Container | Critical | Restart Allowed | Description |
|
||||||
|
|-----------|----------|-----------------|-------------|
|
||||||
|
| paper-dynasty_discord-app_1 | Yes | Yes | Paper Dynasty Discord bot |
|
||||||
|
| paper-dynasty_db_1 | Yes | Yes | PostgreSQL database |
|
||||||
|
| paper-dynasty_adminer_1 | No | Yes | Database admin UI |
|
||||||
|
| sba-website_sba-web_1 | Yes | Yes | SBA website |
|
||||||
|
| sba-ghost_sba-ghost_1 | No | Yes | Ghost CMS |
|
||||||
|
|
||||||
|
## Available Diagnostic Commands
|
||||||
|
|
||||||
|
- `disk_usage` - df -h
|
||||||
|
- `memory_usage` - free -h
|
||||||
|
- `cpu_usage` - top -bn1 | head -20
|
||||||
|
- `cpu_load` - uptime
|
||||||
|
- `process_list` - ps aux --sort=-%mem | head -20
|
||||||
|
- `network_status` - ss -tuln
|
||||||
|
- `docker_ps` - docker ps -a (formatted)
|
||||||
|
- `docker_stats` - docker stats --no-stream
|
||||||
|
- `journal_errors` - journalctl -p err -n 50
|
||||||
|
|
||||||
|
## Security Constraints
|
||||||
|
|
||||||
|
### DENIED Patterns (Will Be Rejected)
|
||||||
|
- rm -rf, rm -r /
|
||||||
|
- dd if=, mkfs
|
||||||
|
- shutdown, reboot
|
||||||
|
- systemctl stop
|
||||||
|
- chmod 777
|
||||||
|
- wget|sh, curl|sh
|
||||||
|
|
||||||
|
### Container Restart Rules
|
||||||
|
- Only containers in config.yaml with restart_allowed: true
|
||||||
|
- N8N container restart is NEVER allowed (it triggers us)
|
||||||
|
|
||||||
|
## MemoryGraph Integration
|
||||||
|
|
||||||
|
Before troubleshooting, check for known solutions:
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/memorygraph/client.py recall "docker paper-dynasty"
|
||||||
|
```
|
||||||
|
|
||||||
|
After resolving, store the pattern:
|
||||||
|
```bash
|
||||||
|
python ~/.claude/skills/memorygraph/client.py store \
|
||||||
|
--type solution \
|
||||||
|
--title "Brief description" \
|
||||||
|
--content "Full explanation..." \
|
||||||
|
--tags "docker,paper-dynasty,fix" \
|
||||||
|
--importance 0.7
|
||||||
|
```
|
||||||
|
|
||||||
|
## Common Issues and Solutions
|
||||||
|
|
||||||
|
### Container Not Running
|
||||||
|
1. Check logs for crash reason
|
||||||
|
2. Check disk space and memory
|
||||||
|
3. Attempt restart if allowed
|
||||||
|
4. Escalate if restart fails
|
||||||
|
|
||||||
|
### High Memory Usage
|
||||||
|
1. Check which container is consuming
|
||||||
|
2. Review docker stats
|
||||||
|
3. Check for memory leaks in logs
|
||||||
|
4. Consider container restart
|
||||||
|
|
||||||
|
### Disk Space Low
|
||||||
|
1. Run disk_usage diagnostic
|
||||||
|
2. Check docker system df
|
||||||
|
3. Consider log rotation
|
||||||
|
4. Alert user for cleanup
|
||||||
|
|
||||||
|
## Output Format
|
||||||
|
|
||||||
|
All commands return JSON:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"success": true,
|
||||||
|
"stdout": "...",
|
||||||
|
"stderr": "...",
|
||||||
|
"returncode": 0,
|
||||||
|
"data": {...} // Parsed data if applicable
|
||||||
|
}
|
||||||
|
```
|
||||||
443
monitoring/recovered-lxc300/server-diagnostics/client.py
Normal file
443
monitoring/recovered-lxc300/server-diagnostics/client.py
Normal file
@ -0,0 +1,443 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Server Diagnostics Client Library
|
||||||
|
Provides SSH-based diagnostics for homelab troubleshooting
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Optional, List, Dict
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
|
||||||
|
class ServerDiagnostics:
|
||||||
|
"""
|
||||||
|
Main diagnostic client for server troubleshooting.
|
||||||
|
|
||||||
|
Connects to servers via SSH and executes whitelisted diagnostic
|
||||||
|
commands. Enforces security constraints from config.yaml.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, config_path: Optional[str] = None):
|
||||||
|
"""
|
||||||
|
Initialize with configuration.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
config_path: Path to config.yaml. Defaults to same directory.
|
||||||
|
"""
|
||||||
|
if config_path is None:
|
||||||
|
config_path = Path(__file__).parent / "config.yaml"
|
||||||
|
self.config = self._load_config(config_path)
|
||||||
|
self.servers = self.config.get("servers", {})
|
||||||
|
self.containers = self.config.get("docker_containers", [])
|
||||||
|
self.allowed_commands = self.config.get("diagnostic_commands", {})
|
||||||
|
self.remediation_commands = self.config.get("remediation_commands", {})
|
||||||
|
self.denied_patterns = self.config.get("denied_patterns", [])
|
||||||
|
|
||||||
|
def _load_config(self, path) -> dict:
|
||||||
|
"""Load YAML configuration."""
|
||||||
|
with open(path) as f:
|
||||||
|
return yaml.safe_load(f)
|
||||||
|
|
||||||
|
def _validate_command(self, command: str) -> bool:
|
||||||
|
"""Check command against deny list."""
|
||||||
|
for pattern in self.denied_patterns:
|
||||||
|
if pattern in command:
|
||||||
|
raise SecurityError(f"Command contains denied pattern: {pattern}")
|
||||||
|
return True
|
||||||
|
|
||||||
|
def _ssh_exec(self, server: str, command: str) -> dict:
|
||||||
|
"""
|
||||||
|
Execute command on remote server via SSH.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with stdout, stderr, returncode
|
||||||
|
"""
|
||||||
|
self._validate_command(command)
|
||||||
|
|
||||||
|
server_config = self.servers.get(server)
|
||||||
|
if not server_config:
|
||||||
|
raise ValueError(f"Unknown server: {server}")
|
||||||
|
|
||||||
|
ssh_key = Path(server_config["ssh_key"]).expanduser()
|
||||||
|
ssh_user = server_config["ssh_user"]
|
||||||
|
hostname = server_config["hostname"]
|
||||||
|
|
||||||
|
ssh_cmd = [
|
||||||
|
"ssh",
|
||||||
|
"-i",
|
||||||
|
str(ssh_key),
|
||||||
|
"-o",
|
||||||
|
"StrictHostKeyChecking=no",
|
||||||
|
"-o",
|
||||||
|
"ConnectTimeout=10",
|
||||||
|
f"{ssh_user}@{hostname}",
|
||||||
|
command,
|
||||||
|
]
|
||||||
|
|
||||||
|
result = subprocess.run(ssh_cmd, capture_output=True, text=True, timeout=60)
|
||||||
|
|
||||||
|
return {
|
||||||
|
"stdout": result.stdout,
|
||||||
|
"stderr": result.stderr,
|
||||||
|
"returncode": result.returncode,
|
||||||
|
"success": result.returncode == 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
# === Docker Operations ===
|
||||||
|
|
||||||
|
def get_docker_status(self, server: str, container: Optional[str] = None) -> dict:
|
||||||
|
"""
|
||||||
|
Get Docker container status.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
server: Server identifier from config
|
||||||
|
container: Specific container name (optional, all if not specified)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with container statuses
|
||||||
|
"""
|
||||||
|
if container:
|
||||||
|
cmd = "docker inspect --format '{{json .State}}' " + container
|
||||||
|
result = self._ssh_exec(server, cmd)
|
||||||
|
if result["success"]:
|
||||||
|
try:
|
||||||
|
result["data"] = json.loads(result["stdout"])
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
result["data"] = None
|
||||||
|
else:
|
||||||
|
# Use Go template format for Docker 20.10 compatibility
|
||||||
|
# Format: Name|Status|State|Ports
|
||||||
|
cmd = "docker ps -a --format '{{.Names}}|{{.Status}}|{{.State}}|{{.Ports}}'"
|
||||||
|
result = self._ssh_exec(server, cmd)
|
||||||
|
if result["success"]:
|
||||||
|
containers = []
|
||||||
|
for line in result["stdout"].strip().split("\n"):
|
||||||
|
if line:
|
||||||
|
parts = line.split("|")
|
||||||
|
if len(parts) >= 3:
|
||||||
|
containers.append(
|
||||||
|
{
|
||||||
|
"Names": parts[0],
|
||||||
|
"Status": parts[1],
|
||||||
|
"State": parts[2],
|
||||||
|
"Ports": parts[3] if len(parts) > 3 else "",
|
||||||
|
}
|
||||||
|
)
|
||||||
|
result["data"] = containers
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
def docker_logs(
|
||||||
|
self,
|
||||||
|
server: str,
|
||||||
|
container: str,
|
||||||
|
lines: int = 100,
|
||||||
|
log_filter: Optional[str] = None,
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
Get Docker container logs.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
server: Server identifier
|
||||||
|
container: Container name
|
||||||
|
lines: Number of lines to retrieve
|
||||||
|
log_filter: Optional grep filter pattern
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with log output
|
||||||
|
"""
|
||||||
|
cmd = f"docker logs --tail {lines} {container} 2>&1"
|
||||||
|
if log_filter:
|
||||||
|
cmd += f" | grep -i '{log_filter}'"
|
||||||
|
|
||||||
|
return self._ssh_exec(server, cmd)
|
||||||
|
|
||||||
|
def docker_restart(self, server: str, container: str) -> dict:
|
||||||
|
"""
|
||||||
|
Restart a Docker container (low-risk remediation).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
server: Server identifier
|
||||||
|
container: Container name
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with operation result
|
||||||
|
"""
|
||||||
|
# Check if container is allowed to be restarted
|
||||||
|
container_config = next(
|
||||||
|
(c for c in self.containers if c["name"] == container), None
|
||||||
|
)
|
||||||
|
|
||||||
|
if not container_config:
|
||||||
|
return {
|
||||||
|
"success": False,
|
||||||
|
"error": f"Container {container} not in monitored list",
|
||||||
|
}
|
||||||
|
|
||||||
|
if not container_config.get("restart_allowed", False):
|
||||||
|
return {
|
||||||
|
"success": False,
|
||||||
|
"error": f"Container {container} restart not permitted",
|
||||||
|
}
|
||||||
|
|
||||||
|
cmd = f"docker restart {container}"
|
||||||
|
result = self._ssh_exec(server, cmd)
|
||||||
|
result["action"] = "docker_restart"
|
||||||
|
result["container"] = container
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
# === System Diagnostics ===
|
||||||
|
|
||||||
|
def get_metrics(self, server: str, metric_type: str = "all") -> dict:
|
||||||
|
"""
|
||||||
|
Get system metrics from server.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
server: Server identifier
|
||||||
|
metric_type: Type of metrics (cpu, memory, disk, network, all)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with metric data
|
||||||
|
"""
|
||||||
|
metrics = {}
|
||||||
|
|
||||||
|
if metric_type in ("cpu", "all"):
|
||||||
|
result = self._ssh_exec(server, self.allowed_commands["cpu_usage"])
|
||||||
|
metrics["cpu"] = result
|
||||||
|
|
||||||
|
if metric_type in ("memory", "all"):
|
||||||
|
result = self._ssh_exec(server, self.allowed_commands["memory_usage"])
|
||||||
|
metrics["memory"] = result
|
||||||
|
|
||||||
|
if metric_type in ("disk", "all"):
|
||||||
|
result = self._ssh_exec(server, self.allowed_commands["disk_usage"])
|
||||||
|
metrics["disk"] = result
|
||||||
|
|
||||||
|
if metric_type in ("network", "all"):
|
||||||
|
result = self._ssh_exec(server, self.allowed_commands["network_status"])
|
||||||
|
metrics["network"] = result
|
||||||
|
|
||||||
|
return {"server": server, "metrics": metrics}
|
||||||
|
|
||||||
|
def read_logs(
|
||||||
|
self,
|
||||||
|
server: str,
|
||||||
|
log_type: str,
|
||||||
|
lines: int = 100,
|
||||||
|
log_filter: Optional[str] = None,
|
||||||
|
custom_path: Optional[str] = None,
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
Read logs from server.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
server: Server identifier
|
||||||
|
log_type: Type of log (system, docker, application, custom)
|
||||||
|
lines: Number of lines
|
||||||
|
log_filter: Optional grep pattern
|
||||||
|
custom_path: Path for custom log type
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with log content
|
||||||
|
"""
|
||||||
|
log_paths = {
|
||||||
|
"system": "/var/log/syslog",
|
||||||
|
"docker": "/var/log/docker.log",
|
||||||
|
"application": "/var/log/application.log",
|
||||||
|
}
|
||||||
|
|
||||||
|
path = custom_path if log_type == "custom" else log_paths.get(log_type)
|
||||||
|
|
||||||
|
if not path:
|
||||||
|
return {"success": False, "error": f"Unknown log type: {log_type}"}
|
||||||
|
|
||||||
|
cmd = f"tail -n {lines} {path}"
|
||||||
|
if log_filter:
|
||||||
|
cmd += f" | grep -i '{log_filter}'"
|
||||||
|
|
||||||
|
return self._ssh_exec(server, cmd)
|
||||||
|
|
||||||
|
def run_diagnostic(
|
||||||
|
self, server: str, command: str, params: Optional[dict] = None
|
||||||
|
) -> dict:
|
||||||
|
"""
|
||||||
|
Run a whitelisted diagnostic command.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
server: Server identifier
|
||||||
|
command: Command key from config whitelist
|
||||||
|
params: Optional parameters to substitute
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
dict with command output
|
||||||
|
"""
|
||||||
|
if command not in self.allowed_commands:
|
||||||
|
return {"success": False, "error": f"Command '{command}' not in whitelist"}
|
||||||
|
|
||||||
|
cmd = self.allowed_commands[command]
|
||||||
|
|
||||||
|
# Substitute parameters if provided
|
||||||
|
if params:
|
||||||
|
for key, value in params.items():
|
||||||
|
cmd = cmd.replace(f"{{{key}}}", str(value))
|
||||||
|
|
||||||
|
return self._ssh_exec(server, cmd)
|
||||||
|
|
||||||
|
# === Convenience Methods ===
|
||||||
|
|
||||||
|
def quick_health_check(self, server: str) -> dict:
|
||||||
|
"""
|
||||||
|
Perform quick health check on server.
|
||||||
|
|
||||||
|
Returns summary of Docker containers, disk, and memory.
|
||||||
|
"""
|
||||||
|
health = {
|
||||||
|
"server": server,
|
||||||
|
"docker": self.get_docker_status(server),
|
||||||
|
"metrics": self.get_metrics(server, "all"),
|
||||||
|
"healthy": True,
|
||||||
|
"issues": [],
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check for stopped containers
|
||||||
|
if health["docker"].get("data"):
|
||||||
|
for container in health["docker"]["data"]:
|
||||||
|
status = container.get("State", container.get("Status", ""))
|
||||||
|
if "Up" not in str(status) and "running" not in str(status).lower():
|
||||||
|
health["healthy"] = False
|
||||||
|
health["issues"].append(
|
||||||
|
f"Container {container.get('Names', 'unknown')} is not running"
|
||||||
|
)
|
||||||
|
|
||||||
|
return health
|
||||||
|
|
||||||
|
def to_json(self, data: Any) -> str:
|
||||||
|
"""Convert result to JSON string."""
|
||||||
|
return json.dumps(data, indent=2, default=str)
|
||||||
|
|
||||||
|
|
||||||
|
class SecurityError(Exception):
|
||||||
|
"""Raised when a command violates security constraints."""
|
||||||
|
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""CLI interface for server diagnostics."""
|
||||||
|
import argparse
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Server Diagnostics CLI",
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
epilog="""
|
||||||
|
Examples:
|
||||||
|
%(prog)s docker-status paper-dynasty
|
||||||
|
%(prog)s docker-status paper-dynasty --container paper-dynasty_discord-app_1
|
||||||
|
%(prog)s docker-logs paper-dynasty paper-dynasty_discord-app_1 --lines 200
|
||||||
|
%(prog)s docker-restart paper-dynasty paper-dynasty_discord-app_1
|
||||||
|
%(prog)s metrics paper-dynasty --type all
|
||||||
|
%(prog)s health paper-dynasty
|
||||||
|
%(prog)s diagnostic paper-dynasty disk_usage
|
||||||
|
""",
|
||||||
|
)
|
||||||
|
|
||||||
|
subparsers = parser.add_subparsers(dest="command", required=True)
|
||||||
|
|
||||||
|
# docker-status
|
||||||
|
p_docker = subparsers.add_parser(
|
||||||
|
"docker-status", help="Get Docker container status"
|
||||||
|
)
|
||||||
|
p_docker.add_argument("server", help="Server identifier")
|
||||||
|
p_docker.add_argument("--container", "-c", help="Specific container name")
|
||||||
|
|
||||||
|
# docker-logs
|
||||||
|
p_logs = subparsers.add_parser("docker-logs", help="Get Docker container logs")
|
||||||
|
p_logs.add_argument("server", help="Server identifier")
|
||||||
|
p_logs.add_argument("container", help="Container name")
|
||||||
|
p_logs.add_argument("--lines", "-n", type=int, default=100, help="Number of lines")
|
||||||
|
p_logs.add_argument("--filter", "-f", dest="log_filter", help="Grep filter pattern")
|
||||||
|
|
||||||
|
# docker-restart
|
||||||
|
p_restart = subparsers.add_parser("docker-restart", help="Restart Docker container")
|
||||||
|
p_restart.add_argument("server", help="Server identifier")
|
||||||
|
p_restart.add_argument("container", help="Container name")
|
||||||
|
|
||||||
|
# metrics
|
||||||
|
p_metrics = subparsers.add_parser("metrics", help="Get system metrics")
|
||||||
|
p_metrics.add_argument("server", help="Server identifier")
|
||||||
|
p_metrics.add_argument(
|
||||||
|
"--type",
|
||||||
|
"-t",
|
||||||
|
default="all",
|
||||||
|
choices=["cpu", "memory", "disk", "network", "all"],
|
||||||
|
help="Metric type",
|
||||||
|
)
|
||||||
|
|
||||||
|
# logs
|
||||||
|
p_syslogs = subparsers.add_parser("logs", help="Read system logs")
|
||||||
|
p_syslogs.add_argument("server", help="Server identifier")
|
||||||
|
p_syslogs.add_argument(
|
||||||
|
"--type",
|
||||||
|
"-t",
|
||||||
|
default="system",
|
||||||
|
choices=["system", "docker", "application", "custom"],
|
||||||
|
help="Log type",
|
||||||
|
)
|
||||||
|
p_syslogs.add_argument(
|
||||||
|
"--lines", "-n", type=int, default=100, help="Number of lines"
|
||||||
|
)
|
||||||
|
p_syslogs.add_argument(
|
||||||
|
"--filter", "-f", dest="log_filter", help="Grep filter pattern"
|
||||||
|
)
|
||||||
|
p_syslogs.add_argument("--path", help="Custom log path (for type=custom)")
|
||||||
|
|
||||||
|
# health
|
||||||
|
p_health = subparsers.add_parser("health", help="Quick health check")
|
||||||
|
p_health.add_argument("server", help="Server identifier")
|
||||||
|
|
||||||
|
# diagnostic
|
||||||
|
p_diag = subparsers.add_parser("diagnostic", help="Run whitelisted diagnostic")
|
||||||
|
p_diag.add_argument("server", help="Server identifier")
|
||||||
|
p_diag.add_argument("diagnostic_cmd", help="Command from whitelist")
|
||||||
|
p_diag.add_argument(
|
||||||
|
"--params", "-p", help="JSON parameters for command substitution"
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
client = ServerDiagnostics()
|
||||||
|
|
||||||
|
if args.command == "docker-status":
|
||||||
|
result = client.get_docker_status(args.server, args.container)
|
||||||
|
|
||||||
|
elif args.command == "docker-logs":
|
||||||
|
result = client.docker_logs(
|
||||||
|
args.server, args.container, args.lines, args.log_filter
|
||||||
|
)
|
||||||
|
|
||||||
|
elif args.command == "docker-restart":
|
||||||
|
result = client.docker_restart(args.server, args.container)
|
||||||
|
|
||||||
|
elif args.command == "metrics":
|
||||||
|
result = client.get_metrics(args.server, args.type)
|
||||||
|
|
||||||
|
elif args.command == "logs":
|
||||||
|
result = client.read_logs(
|
||||||
|
args.server, args.type, args.lines, args.log_filter, args.path
|
||||||
|
)
|
||||||
|
|
||||||
|
elif args.command == "health":
|
||||||
|
result = client.quick_health_check(args.server)
|
||||||
|
|
||||||
|
elif args.command == "diagnostic":
|
||||||
|
params = json.loads(args.params) if args.params else None
|
||||||
|
result = client.run_diagnostic(args.server, args.diagnostic_cmd, params)
|
||||||
|
|
||||||
|
print(client.to_json(result))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
72
monitoring/recovered-lxc300/server-diagnostics/config.yaml
Normal file
72
monitoring/recovered-lxc300/server-diagnostics/config.yaml
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
# Server Diagnostics Configuration
|
||||||
|
# Used by client.py for server inventory and security constraints
|
||||||
|
|
||||||
|
# Server inventory - SSH connection details
|
||||||
|
servers:
|
||||||
|
paper-dynasty:
|
||||||
|
hostname: 10.10.0.88
|
||||||
|
ssh_user: cal
|
||||||
|
ssh_key: ~/.ssh/claude_diagnostics_key
|
||||||
|
description: "Paper Dynasty Discord bots and services"
|
||||||
|
|
||||||
|
# Docker containers to monitor
|
||||||
|
# restart_allowed: false prevents automatic remediation
|
||||||
|
docker_containers:
|
||||||
|
- name: paper-dynasty_discord-app_1
|
||||||
|
critical: true
|
||||||
|
restart_allowed: true
|
||||||
|
description: "Paper Dynasty Discord bot"
|
||||||
|
|
||||||
|
- name: paper-dynasty_db_1
|
||||||
|
critical: true
|
||||||
|
restart_allowed: true
|
||||||
|
description: "Paper Dynasty PostgreSQL database"
|
||||||
|
|
||||||
|
- name: paper-dynasty_adminer_1
|
||||||
|
critical: false
|
||||||
|
restart_allowed: true
|
||||||
|
description: "Database admin UI"
|
||||||
|
|
||||||
|
- name: sba-website_sba-web_1
|
||||||
|
critical: true
|
||||||
|
restart_allowed: true
|
||||||
|
description: "SBA website"
|
||||||
|
|
||||||
|
- name: sba-ghost_sba-ghost_1
|
||||||
|
critical: false
|
||||||
|
restart_allowed: true
|
||||||
|
description: "SBA Ghost CMS"
|
||||||
|
|
||||||
|
# Whitelisted diagnostic commands
|
||||||
|
diagnostic_commands:
|
||||||
|
disk_usage: "df -h"
|
||||||
|
memory_usage: "free -h"
|
||||||
|
cpu_usage: "top -bn1 | head -20"
|
||||||
|
cpu_load: "uptime"
|
||||||
|
process_list: "ps aux --sort=-%mem | head -20"
|
||||||
|
network_status: "ss -tuln"
|
||||||
|
docker_ps: "docker ps -a --format 'table {{.Names}}\\t{{.Status}}\\t{{.Ports}}'"
|
||||||
|
docker_stats: "docker stats --no-stream --format 'table {{.Name}}\\t{{.CPUPerc}}\\t{{.MemUsage}}'"
|
||||||
|
journal_errors: "journalctl -p err -n 50 --no-pager"
|
||||||
|
|
||||||
|
# Remediation commands (low-risk only)
|
||||||
|
remediation_commands:
|
||||||
|
docker_restart: "docker restart {container}"
|
||||||
|
docker_logs: "docker logs --tail 500 {container}"
|
||||||
|
|
||||||
|
# DENIED patterns - commands containing these will be rejected
|
||||||
|
denied_patterns:
|
||||||
|
- "rm -rf"
|
||||||
|
- "rm -r /"
|
||||||
|
- "dd if="
|
||||||
|
- "mkfs"
|
||||||
|
- ":(){:|:&};:"
|
||||||
|
- "shutdown"
|
||||||
|
- "reboot"
|
||||||
|
- "init 0"
|
||||||
|
- "init 6"
|
||||||
|
- "systemctl stop"
|
||||||
|
- "> /dev/sd"
|
||||||
|
- "chmod 777"
|
||||||
|
- "wget|sh"
|
||||||
|
- "curl|sh"
|
||||||
@ -0,0 +1 @@
|
|||||||
|
pyyaml>=6.0
|
||||||
26
monitoring/recovered-lxc300/settings.json
Normal file
26
monitoring/recovered-lxc300/settings.json
Normal file
@ -0,0 +1,26 @@
|
|||||||
|
{
|
||||||
|
"permissions": {
|
||||||
|
"allow": [
|
||||||
|
"Bash(python3 ~/.claude/skills/server-diagnostics/client.py:*)",
|
||||||
|
"Bash(ssh -i ~/.ssh/claude_diagnostics_key:*)",
|
||||||
|
"Read(~/.claude/skills/**)",
|
||||||
|
"Read(~/.claude/logs/**)",
|
||||||
|
"Glob(*)",
|
||||||
|
"Grep(*)"
|
||||||
|
],
|
||||||
|
"deny": [
|
||||||
|
"Bash(rm -rf:*)",
|
||||||
|
"Bash(rm -r /:*)",
|
||||||
|
"Bash(dd:*)",
|
||||||
|
"Bash(mkfs:*)",
|
||||||
|
"Bash(shutdown:*)",
|
||||||
|
"Bash(reboot:*)",
|
||||||
|
"Bash(*> /dev/sd*)",
|
||||||
|
"Bash(chmod 777:*)",
|
||||||
|
"Bash(*|sh)",
|
||||||
|
"Bash(*curl*|*bash*)",
|
||||||
|
"Bash(*wget*|*bash*)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"model": "sonnet"
|
||||||
|
}
|
||||||
19
tdarr/archive/README.md
Normal file
19
tdarr/archive/README.md
Normal file
@ -0,0 +1,19 @@
|
|||||||
|
# Legacy Tdarr Scripts
|
||||||
|
|
||||||
|
## tdarr_monitor_local_node.py
|
||||||
|
|
||||||
|
Full-featured Tdarr monitoring script (~1200 lines) built for when the local workstation (nobara-pc) ran as an unmapped remote Tdarr node with GPU transcoding.
|
||||||
|
|
||||||
|
**Features:** Stuck job detection via cross-run state comparison (pickle file), automatic worker killing, Discord alerts, configurable thresholds, rotating log files.
|
||||||
|
|
||||||
|
**Why it existed:** The unmapped remote node architecture was prone to stuck jobs caused by network issues during file transfers between the remote node and server. The monitor ran every minute via cron to detect and kill stuck workers.
|
||||||
|
|
||||||
|
**Why it's archived:** Transcoding moved to ubuntu-manticore (10.10.0.226) as a local mapped node with shared NFS storage. No remote transfers means no stuck jobs. Tdarr manages its own workers natively. Archived February 2026.
|
||||||
|
|
||||||
|
## tdarr_file_monitor_local_node.py + tdarr-file-monitor-cron_local_node.sh
|
||||||
|
|
||||||
|
File completion monitor that watched the local Tdarr cache directory for finished `.mkv` transcodes and copied the smallest version to a backup location. The cron wrapper ran it every minute.
|
||||||
|
|
||||||
|
**Why it existed:** When the local workstation ran as an unmapped Tdarr node, completed transcodes landed in the local NVMe cache. This monitor detected completion (by tracking size stability) and kept the best copy.
|
||||||
|
|
||||||
|
**Why it's archived:** Same reason as above - mapped node on manticore writes directly to the shared NFS media mount. No local cache to monitor. Archived February 2026.
|
||||||
6
tdarr/archive/tdarr-file-monitor-cron_local_node.sh
Executable file
6
tdarr/archive/tdarr-file-monitor-cron_local_node.sh
Executable file
@ -0,0 +1,6 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Cron job wrapper for Tdarr file monitor
|
||||||
|
# Add this to crontab with: * * * * * /mnt/NV2/Development/claude-home/monitoring/scripts/tdarr-file-monitor-cron.sh
|
||||||
|
|
||||||
|
cd /mnt/NV2/Development/claude-home/monitoring/scripts
|
||||||
|
/usr/bin/python3 /mnt/NV2/Development/claude-home/monitoring/scripts/tdarr_file_monitor.py
|
||||||
286
tdarr/archive/tdarr_file_monitor_local_node.py
Executable file
286
tdarr/archive/tdarr_file_monitor_local_node.py
Executable file
@ -0,0 +1,286 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Tdarr File Monitor - Monitors Tdarr cache directory for completed .mkv files and copies them to backup location.
|
||||||
|
Detects file completion by monitoring size changes and always keeps the smallest version of duplicate files.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import shutil
|
||||||
|
import json
|
||||||
|
import time
|
||||||
|
import logging
|
||||||
|
from pathlib import Path
|
||||||
|
from dataclasses import dataclass, asdict
|
||||||
|
from typing import Dict, Optional
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class FileState:
|
||||||
|
"""Tracks the state of a monitored file."""
|
||||||
|
path: str
|
||||||
|
size: int
|
||||||
|
last_modified: float
|
||||||
|
first_seen: float
|
||||||
|
last_size_change: float
|
||||||
|
check_count: int = 0
|
||||||
|
|
||||||
|
|
||||||
|
class TdarrFileMonitor:
|
||||||
|
"""Monitors Tdarr cache directory for completed .mkv files."""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
source_dir: str = "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/temp",
|
||||||
|
media_dir: str = "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/media",
|
||||||
|
dest_dir: str = "/mnt/NV2/tdarr-cache/manual-backup",
|
||||||
|
state_file: str = "/mnt/NV2/Development/claude-home/logs/tdarr_file_monitor_state.json",
|
||||||
|
completion_wait_seconds: int = 60,
|
||||||
|
log_file: str = "/mnt/NV2/Development/claude-home/logs/tdarr_file_monitor.log"
|
||||||
|
):
|
||||||
|
self.source_dir = Path(source_dir)
|
||||||
|
self.media_dir = Path(media_dir)
|
||||||
|
self.dest_dir = Path(dest_dir)
|
||||||
|
self.state_file = Path(state_file)
|
||||||
|
self.completion_wait_seconds = completion_wait_seconds
|
||||||
|
self.monitored_files: Dict[str, FileState] = {}
|
||||||
|
|
||||||
|
# Setup logging
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||||
|
handlers=[
|
||||||
|
logging.FileHandler(log_file),
|
||||||
|
logging.StreamHandler()
|
||||||
|
]
|
||||||
|
)
|
||||||
|
self.logger = logging.getLogger(f'{__name__}.TdarrFileMonitor')
|
||||||
|
|
||||||
|
# Ensure destination directory exists
|
||||||
|
self.dest_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
# Load previous state
|
||||||
|
self._load_state()
|
||||||
|
|
||||||
|
def _load_state(self) -> None:
|
||||||
|
"""Load monitored files state from disk."""
|
||||||
|
if self.state_file.exists():
|
||||||
|
try:
|
||||||
|
with open(self.state_file, 'r') as f:
|
||||||
|
data = json.load(f)
|
||||||
|
self.monitored_files = {
|
||||||
|
path: FileState(**file_data)
|
||||||
|
for path, file_data in data.items()
|
||||||
|
}
|
||||||
|
self.logger.info(f"Loaded state for {len(self.monitored_files)} monitored files")
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Failed to load state file: {e}")
|
||||||
|
self.monitored_files = {}
|
||||||
|
|
||||||
|
def _save_state(self) -> None:
|
||||||
|
"""Save monitored files state to disk."""
|
||||||
|
try:
|
||||||
|
with open(self.state_file, 'w') as f:
|
||||||
|
data = {path: asdict(state) for path, state in self.monitored_files.items()}
|
||||||
|
json.dump(data, f, indent=2)
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Failed to save state file: {e}")
|
||||||
|
|
||||||
|
def _scan_for_mkv_files(self) -> Dict[str, Path]:
|
||||||
|
"""Scan source directory for .mkv files in all subdirectories."""
|
||||||
|
mkv_files = {}
|
||||||
|
try:
|
||||||
|
for mkv_file in self.source_dir.rglob("*.mkv"):
|
||||||
|
if mkv_file.is_file():
|
||||||
|
mkv_files[str(mkv_file)] = mkv_file
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error scanning source directory: {e}")
|
||||||
|
|
||||||
|
return mkv_files
|
||||||
|
|
||||||
|
def _get_file_info(self, file_path: Path) -> Optional[tuple]:
|
||||||
|
"""Get file size and modification time, return None if file doesn't exist or can't be accessed."""
|
||||||
|
try:
|
||||||
|
stat = file_path.stat()
|
||||||
|
return stat.st_size, stat.st_mtime
|
||||||
|
except (OSError, FileNotFoundError) as e:
|
||||||
|
self.logger.warning(f"Cannot access file {file_path}: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def _validate_file_pair(self, temp_file_path: Path, temp_file_size: int) -> bool:
|
||||||
|
"""Validate that a matching file exists in media directory with exact same name and size."""
|
||||||
|
try:
|
||||||
|
# Search for matching file in media directory tree
|
||||||
|
for media_file in self.media_dir.rglob(temp_file_path.name):
|
||||||
|
if media_file.is_file():
|
||||||
|
media_file_info = self._get_file_info(media_file)
|
||||||
|
if media_file_info:
|
||||||
|
media_size, _ = media_file_info
|
||||||
|
if media_size == temp_file_size:
|
||||||
|
self.logger.debug(f"Found matching file: {temp_file_path.name} ({temp_file_size:,} bytes) in temp and media directories")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
self.logger.debug(f"Size mismatch for {temp_file_path.name}: temp={temp_file_size:,}, media={media_size:,}")
|
||||||
|
|
||||||
|
# No matching file found
|
||||||
|
self.logger.info(f"No matching file found in media directory for {temp_file_path.name} ({temp_file_size:,} bytes)")
|
||||||
|
return False
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Error validating file pair for {temp_file_path.name}: {e}")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _is_file_complete(self, file_state: FileState, current_time: float) -> bool:
|
||||||
|
"""Check if file is complete based on size stability."""
|
||||||
|
stale_time = current_time - file_state.last_size_change
|
||||||
|
return stale_time >= self.completion_wait_seconds
|
||||||
|
|
||||||
|
def _should_copy_file(self, source_path: Path, dest_path: Path) -> bool:
|
||||||
|
"""Determine if we should copy the file (always keep smaller version)."""
|
||||||
|
if not dest_path.exists():
|
||||||
|
return True
|
||||||
|
|
||||||
|
source_size = source_path.stat().st_size
|
||||||
|
dest_size = dest_path.stat().st_size
|
||||||
|
|
||||||
|
if source_size < dest_size:
|
||||||
|
self.logger.info(f"Source file {source_path.name} ({source_size:,} bytes) is smaller than existing destination ({dest_size:,} bytes), will replace")
|
||||||
|
return True
|
||||||
|
else:
|
||||||
|
self.logger.info(f"Source file {source_path.name} ({source_size:,} bytes) is not smaller than existing destination ({dest_size:,} bytes), skipping")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def _copy_file_with_retry(self, source_path: Path, dest_path: Path) -> bool:
|
||||||
|
"""Copy file with retry logic and cleanup on failure."""
|
||||||
|
temp_dest = dest_path.with_suffix(dest_path.suffix + '.tmp')
|
||||||
|
|
||||||
|
for attempt in range(2): # Try twice
|
||||||
|
try:
|
||||||
|
start_time = time.time()
|
||||||
|
self.logger.info(f"Attempt {attempt + 1}: Copying {source_path.name} ({source_path.stat().st_size:,} bytes)")
|
||||||
|
|
||||||
|
# Copy to temporary file first
|
||||||
|
shutil.copy2(source_path, temp_dest)
|
||||||
|
|
||||||
|
# Verify copy completed successfully
|
||||||
|
if temp_dest.stat().st_size != source_path.stat().st_size:
|
||||||
|
raise Exception(f"Copy verification failed: size mismatch")
|
||||||
|
|
||||||
|
# Move temp file to final destination
|
||||||
|
if dest_path.exists():
|
||||||
|
dest_path.unlink() # Remove existing file
|
||||||
|
temp_dest.rename(dest_path)
|
||||||
|
|
||||||
|
copy_time = time.time() - start_time
|
||||||
|
final_size = dest_path.stat().st_size
|
||||||
|
|
||||||
|
self.logger.info(f"Successfully copied {source_path.name} ({final_size:,} bytes) in {copy_time:.2f}s")
|
||||||
|
return True
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
self.logger.error(f"Copy attempt {attempt + 1} failed for {source_path.name}: {e}")
|
||||||
|
|
||||||
|
# Cleanup temporary file if it exists
|
||||||
|
if temp_dest.exists():
|
||||||
|
try:
|
||||||
|
temp_dest.unlink()
|
||||||
|
except Exception as cleanup_error:
|
||||||
|
self.logger.error(f"Failed to cleanup temp file {temp_dest}: {cleanup_error}")
|
||||||
|
|
||||||
|
if attempt == 1: # Last attempt failed
|
||||||
|
self.logger.error(f"All copy attempts failed for {source_path.name}, giving up")
|
||||||
|
return False
|
||||||
|
else:
|
||||||
|
time.sleep(5) # Wait before retry
|
||||||
|
|
||||||
|
return False
|
||||||
|
|
||||||
|
def run_check(self) -> None:
|
||||||
|
"""Run a single monitoring check cycle."""
|
||||||
|
current_time = time.time()
|
||||||
|
self.logger.info("Starting monitoring check cycle")
|
||||||
|
|
||||||
|
# Scan for current .mkv files
|
||||||
|
current_files = self._scan_for_mkv_files()
|
||||||
|
self.logger.info(f"Found {len(current_files)} .mkv files in source directory")
|
||||||
|
|
||||||
|
# Remove files from monitoring that no longer exist
|
||||||
|
missing_files = set(self.monitored_files.keys()) - set(current_files.keys())
|
||||||
|
for missing_file in missing_files:
|
||||||
|
self.logger.info(f"File no longer exists, removing from monitoring: {Path(missing_file).name}")
|
||||||
|
del self.monitored_files[missing_file]
|
||||||
|
|
||||||
|
# Process each current file
|
||||||
|
files_to_copy = []
|
||||||
|
for file_path_str, file_path in current_files.items():
|
||||||
|
file_info = self._get_file_info(file_path)
|
||||||
|
if not file_info:
|
||||||
|
continue
|
||||||
|
|
||||||
|
current_size, current_mtime = file_info
|
||||||
|
|
||||||
|
# Update or create file state
|
||||||
|
if file_path_str in self.monitored_files:
|
||||||
|
file_state = self.monitored_files[file_path_str]
|
||||||
|
file_state.check_count += 1
|
||||||
|
|
||||||
|
# Check if size changed
|
||||||
|
if current_size != file_state.size:
|
||||||
|
file_state.size = current_size
|
||||||
|
file_state.last_size_change = current_time
|
||||||
|
self.logger.debug(f"Size changed for {file_path.name}: {current_size:,} bytes")
|
||||||
|
|
||||||
|
file_state.last_modified = current_mtime
|
||||||
|
|
||||||
|
else:
|
||||||
|
# New file discovered - validate before tracking
|
||||||
|
if not self._validate_file_pair(file_path, current_size):
|
||||||
|
# File doesn't have a matching pair in media directory, skip tracking
|
||||||
|
continue
|
||||||
|
|
||||||
|
file_state = FileState(
|
||||||
|
path=file_path_str,
|
||||||
|
size=current_size,
|
||||||
|
last_modified=current_mtime,
|
||||||
|
first_seen=current_time,
|
||||||
|
last_size_change=current_time,
|
||||||
|
check_count=1
|
||||||
|
)
|
||||||
|
self.monitored_files[file_path_str] = file_state
|
||||||
|
self.logger.info(f"Started monitoring validated file: {file_path.name} ({current_size:,} bytes)")
|
||||||
|
|
||||||
|
# Log current state
|
||||||
|
stale_time = current_time - file_state.last_size_change
|
||||||
|
self.logger.info(f"Checking {file_path.name}: {current_size:,} bytes, stale for {stale_time:.1f}s (checks: {file_state.check_count})")
|
||||||
|
|
||||||
|
# Check if file is complete
|
||||||
|
if self._is_file_complete(file_state, current_time):
|
||||||
|
dest_path = self.dest_dir / file_path.name
|
||||||
|
if self._should_copy_file(file_path, dest_path):
|
||||||
|
files_to_copy.append((file_path, dest_path, file_state))
|
||||||
|
|
||||||
|
# Copy completed files
|
||||||
|
for source_path, dest_path, file_state in files_to_copy:
|
||||||
|
self.logger.info(f"File appears complete: {source_path.name} (stable for {current_time - file_state.last_size_change:.1f}s)")
|
||||||
|
|
||||||
|
if self._copy_file_with_retry(source_path, dest_path):
|
||||||
|
# Remove from monitoring after successful copy
|
||||||
|
del self.monitored_files[str(source_path)]
|
||||||
|
self.logger.info(f"Successfully processed and removed from monitoring: {source_path.name}")
|
||||||
|
else:
|
||||||
|
self.logger.error(f"Failed to copy {source_path.name}, will continue monitoring")
|
||||||
|
|
||||||
|
# Save state
|
||||||
|
self._save_state()
|
||||||
|
|
||||||
|
self.logger.info(f"Check cycle completed, monitoring {len(self.monitored_files)} files")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Main entry point for the script."""
|
||||||
|
monitor = TdarrFileMonitor()
|
||||||
|
monitor.run_check()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
1227
tdarr/archive/tdarr_monitor_local_node.py
Executable file
1227
tdarr/archive/tdarr_monitor_local_node.py
Executable file
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user