chore: add recovered CT 302 configs, archive tdarr scripts, clean up repo

- Add recovered LXC 300/302 server-diagnostics configs as reference
  (headless Claude permission patterns, health check client)
- Archive decommissioned tdarr monitoring scripts
- Gitignore rpg-art/ directory
- Delete stray temp files and swarm-test/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Cal Corum 2026-03-01 00:41:41 -06:00
parent 64f9662f25
commit 28abde7c9f
10 changed files with 2260 additions and 0 deletions

3
.gitignore vendored
View File

@ -17,3 +17,6 @@ __pycache__
# Large binary files # Large binary files
*.zip *.zip
# Art assets (managed separately)
rpg-art/

View File

@ -0,0 +1,177 @@
---
name: server-diagnostics
description: |
Automated server troubleshooting for Docker containers and system health.
Provides SSH-based diagnostics, log reading, metrics collection, and low-risk
remediation. USE WHEN N8N triggers troubleshooting, container issues detected,
or system health checks needed.
---
# Server Diagnostics - Automated Troubleshooting
## When to Activate This Skill
- N8N triggers with error context
- "diagnose container X", "check docker status"
- "read logs from server", "check disk usage"
- "troubleshoot server issue"
- Any automated health check response
## Quick Start
### Check All Containers
```bash
python ~/.claude/skills/server-diagnostics/client.py docker-status paper-dynasty
```
### Quick Health Check (Docker + System Metrics)
```bash
python ~/.claude/skills/server-diagnostics/client.py health paper-dynasty
```
### Get Container Logs
```bash
python ~/.claude/skills/server-diagnostics/client.py docker-logs paper-dynasty paper-dynasty_discord-app_1 --lines 200
```
### Restart a Container
```bash
python ~/.claude/skills/server-diagnostics/client.py docker-restart paper-dynasty paper-dynasty_discord-app_1
```
### System Metrics
```bash
python ~/.claude/skills/server-diagnostics/client.py metrics paper-dynasty --type all
python ~/.claude/skills/server-diagnostics/client.py metrics paper-dynasty --type disk
```
### Run Diagnostic Command
```bash
python ~/.claude/skills/server-diagnostics/client.py diagnostic paper-dynasty disk_usage
python ~/.claude/skills/server-diagnostics/client.py diagnostic paper-dynasty memory_usage
```
## Troubleshooting Workflow
When an issue is reported:
1. **Quick Health Check** - Get overview of containers and system state
```bash
python ~/.claude/skills/server-diagnostics/client.py health paper-dynasty
```
2. **Check MemoryGraph** - Recall similar issues
```bash
python ~/.claude/skills/memorygraph/client.py recall "docker container error"
```
3. **Get Container Logs** - Look for errors
```bash
python ~/.claude/skills/server-diagnostics/client.py docker-logs paper-dynasty <container> --lines 500 --filter error
```
4. **Remediate if Safe** - Restart if allowed
```bash
python ~/.claude/skills/server-diagnostics/client.py docker-restart paper-dynasty <container>
```
5. **Store Solution** - Save to MemoryGraph if resolved
```bash
python ~/.claude/skills/memorygraph/client.py store \
--type solution \
--title "Fixed <container> issue" \
--content "Description of problem and solution" \
--tags "docker,paper-dynasty,troubleshooting" \
--importance 0.7
```
## Server Inventory
| Server | IP | SSH User | Description |
|--------|-----|----------|-------------|
| paper-dynasty | 10.10.0.88 | cal | Paper Dynasty Discord bots and services |
## Monitored Containers
| Container | Critical | Restart Allowed | Description |
|-----------|----------|-----------------|-------------|
| paper-dynasty_discord-app_1 | Yes | Yes | Paper Dynasty Discord bot |
| paper-dynasty_db_1 | Yes | Yes | PostgreSQL database |
| paper-dynasty_adminer_1 | No | Yes | Database admin UI |
| sba-website_sba-web_1 | Yes | Yes | SBA website |
| sba-ghost_sba-ghost_1 | No | Yes | Ghost CMS |
## Available Diagnostic Commands
- `disk_usage` - df -h
- `memory_usage` - free -h
- `cpu_usage` - top -bn1 | head -20
- `cpu_load` - uptime
- `process_list` - ps aux --sort=-%mem | head -20
- `network_status` - ss -tuln
- `docker_ps` - docker ps -a (formatted)
- `docker_stats` - docker stats --no-stream
- `journal_errors` - journalctl -p err -n 50
## Security Constraints
### DENIED Patterns (Will Be Rejected)
- rm -rf, rm -r /
- dd if=, mkfs
- shutdown, reboot
- systemctl stop
- chmod 777
- wget|sh, curl|sh
### Container Restart Rules
- Only containers in config.yaml with restart_allowed: true
- N8N container restart is NEVER allowed (it triggers us)
## MemoryGraph Integration
Before troubleshooting, check for known solutions:
```bash
python ~/.claude/skills/memorygraph/client.py recall "docker paper-dynasty"
```
After resolving, store the pattern:
```bash
python ~/.claude/skills/memorygraph/client.py store \
--type solution \
--title "Brief description" \
--content "Full explanation..." \
--tags "docker,paper-dynasty,fix" \
--importance 0.7
```
## Common Issues and Solutions
### Container Not Running
1. Check logs for crash reason
2. Check disk space and memory
3. Attempt restart if allowed
4. Escalate if restart fails
### High Memory Usage
1. Check which container is consuming
2. Review docker stats
3. Check for memory leaks in logs
4. Consider container restart
### Disk Space Low
1. Run disk_usage diagnostic
2. Check docker system df
3. Consider log rotation
4. Alert user for cleanup
## Output Format
All commands return JSON:
```json
{
"success": true,
"stdout": "...",
"stderr": "...",
"returncode": 0,
"data": {...} // Parsed data if applicable
}
```

View File

@ -0,0 +1,443 @@
#!/usr/bin/env python3
"""
Server Diagnostics Client Library
Provides SSH-based diagnostics for homelab troubleshooting
"""
import json
import subprocess
from pathlib import Path
from typing import Any, Optional, List, Dict
import yaml
class ServerDiagnostics:
"""
Main diagnostic client for server troubleshooting.
Connects to servers via SSH and executes whitelisted diagnostic
commands. Enforces security constraints from config.yaml.
"""
def __init__(self, config_path: Optional[str] = None):
"""
Initialize with configuration.
Args:
config_path: Path to config.yaml. Defaults to same directory.
"""
if config_path is None:
config_path = Path(__file__).parent / "config.yaml"
self.config = self._load_config(config_path)
self.servers = self.config.get("servers", {})
self.containers = self.config.get("docker_containers", [])
self.allowed_commands = self.config.get("diagnostic_commands", {})
self.remediation_commands = self.config.get("remediation_commands", {})
self.denied_patterns = self.config.get("denied_patterns", [])
def _load_config(self, path) -> dict:
"""Load YAML configuration."""
with open(path) as f:
return yaml.safe_load(f)
def _validate_command(self, command: str) -> bool:
"""Check command against deny list."""
for pattern in self.denied_patterns:
if pattern in command:
raise SecurityError(f"Command contains denied pattern: {pattern}")
return True
def _ssh_exec(self, server: str, command: str) -> dict:
"""
Execute command on remote server via SSH.
Returns:
dict with stdout, stderr, returncode
"""
self._validate_command(command)
server_config = self.servers.get(server)
if not server_config:
raise ValueError(f"Unknown server: {server}")
ssh_key = Path(server_config["ssh_key"]).expanduser()
ssh_user = server_config["ssh_user"]
hostname = server_config["hostname"]
ssh_cmd = [
"ssh",
"-i",
str(ssh_key),
"-o",
"StrictHostKeyChecking=no",
"-o",
"ConnectTimeout=10",
f"{ssh_user}@{hostname}",
command,
]
result = subprocess.run(ssh_cmd, capture_output=True, text=True, timeout=60)
return {
"stdout": result.stdout,
"stderr": result.stderr,
"returncode": result.returncode,
"success": result.returncode == 0,
}
# === Docker Operations ===
def get_docker_status(self, server: str, container: Optional[str] = None) -> dict:
"""
Get Docker container status.
Args:
server: Server identifier from config
container: Specific container name (optional, all if not specified)
Returns:
dict with container statuses
"""
if container:
cmd = "docker inspect --format '{{json .State}}' " + container
result = self._ssh_exec(server, cmd)
if result["success"]:
try:
result["data"] = json.loads(result["stdout"])
except json.JSONDecodeError:
result["data"] = None
else:
# Use Go template format for Docker 20.10 compatibility
# Format: Name|Status|State|Ports
cmd = "docker ps -a --format '{{.Names}}|{{.Status}}|{{.State}}|{{.Ports}}'"
result = self._ssh_exec(server, cmd)
if result["success"]:
containers = []
for line in result["stdout"].strip().split("\n"):
if line:
parts = line.split("|")
if len(parts) >= 3:
containers.append(
{
"Names": parts[0],
"Status": parts[1],
"State": parts[2],
"Ports": parts[3] if len(parts) > 3 else "",
}
)
result["data"] = containers
return result
def docker_logs(
self,
server: str,
container: str,
lines: int = 100,
log_filter: Optional[str] = None,
) -> dict:
"""
Get Docker container logs.
Args:
server: Server identifier
container: Container name
lines: Number of lines to retrieve
log_filter: Optional grep filter pattern
Returns:
dict with log output
"""
cmd = f"docker logs --tail {lines} {container} 2>&1"
if log_filter:
cmd += f" | grep -i '{log_filter}'"
return self._ssh_exec(server, cmd)
def docker_restart(self, server: str, container: str) -> dict:
"""
Restart a Docker container (low-risk remediation).
Args:
server: Server identifier
container: Container name
Returns:
dict with operation result
"""
# Check if container is allowed to be restarted
container_config = next(
(c for c in self.containers if c["name"] == container), None
)
if not container_config:
return {
"success": False,
"error": f"Container {container} not in monitored list",
}
if not container_config.get("restart_allowed", False):
return {
"success": False,
"error": f"Container {container} restart not permitted",
}
cmd = f"docker restart {container}"
result = self._ssh_exec(server, cmd)
result["action"] = "docker_restart"
result["container"] = container
return result
# === System Diagnostics ===
def get_metrics(self, server: str, metric_type: str = "all") -> dict:
"""
Get system metrics from server.
Args:
server: Server identifier
metric_type: Type of metrics (cpu, memory, disk, network, all)
Returns:
dict with metric data
"""
metrics = {}
if metric_type in ("cpu", "all"):
result = self._ssh_exec(server, self.allowed_commands["cpu_usage"])
metrics["cpu"] = result
if metric_type in ("memory", "all"):
result = self._ssh_exec(server, self.allowed_commands["memory_usage"])
metrics["memory"] = result
if metric_type in ("disk", "all"):
result = self._ssh_exec(server, self.allowed_commands["disk_usage"])
metrics["disk"] = result
if metric_type in ("network", "all"):
result = self._ssh_exec(server, self.allowed_commands["network_status"])
metrics["network"] = result
return {"server": server, "metrics": metrics}
def read_logs(
self,
server: str,
log_type: str,
lines: int = 100,
log_filter: Optional[str] = None,
custom_path: Optional[str] = None,
) -> dict:
"""
Read logs from server.
Args:
server: Server identifier
log_type: Type of log (system, docker, application, custom)
lines: Number of lines
log_filter: Optional grep pattern
custom_path: Path for custom log type
Returns:
dict with log content
"""
log_paths = {
"system": "/var/log/syslog",
"docker": "/var/log/docker.log",
"application": "/var/log/application.log",
}
path = custom_path if log_type == "custom" else log_paths.get(log_type)
if not path:
return {"success": False, "error": f"Unknown log type: {log_type}"}
cmd = f"tail -n {lines} {path}"
if log_filter:
cmd += f" | grep -i '{log_filter}'"
return self._ssh_exec(server, cmd)
def run_diagnostic(
self, server: str, command: str, params: Optional[dict] = None
) -> dict:
"""
Run a whitelisted diagnostic command.
Args:
server: Server identifier
command: Command key from config whitelist
params: Optional parameters to substitute
Returns:
dict with command output
"""
if command not in self.allowed_commands:
return {"success": False, "error": f"Command '{command}' not in whitelist"}
cmd = self.allowed_commands[command]
# Substitute parameters if provided
if params:
for key, value in params.items():
cmd = cmd.replace(f"{{{key}}}", str(value))
return self._ssh_exec(server, cmd)
# === Convenience Methods ===
def quick_health_check(self, server: str) -> dict:
"""
Perform quick health check on server.
Returns summary of Docker containers, disk, and memory.
"""
health = {
"server": server,
"docker": self.get_docker_status(server),
"metrics": self.get_metrics(server, "all"),
"healthy": True,
"issues": [],
}
# Check for stopped containers
if health["docker"].get("data"):
for container in health["docker"]["data"]:
status = container.get("State", container.get("Status", ""))
if "Up" not in str(status) and "running" not in str(status).lower():
health["healthy"] = False
health["issues"].append(
f"Container {container.get('Names', 'unknown')} is not running"
)
return health
def to_json(self, data: Any) -> str:
"""Convert result to JSON string."""
return json.dumps(data, indent=2, default=str)
class SecurityError(Exception):
"""Raised when a command violates security constraints."""
pass
def main():
"""CLI interface for server diagnostics."""
import argparse
parser = argparse.ArgumentParser(
description="Server Diagnostics CLI",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s docker-status paper-dynasty
%(prog)s docker-status paper-dynasty --container paper-dynasty_discord-app_1
%(prog)s docker-logs paper-dynasty paper-dynasty_discord-app_1 --lines 200
%(prog)s docker-restart paper-dynasty paper-dynasty_discord-app_1
%(prog)s metrics paper-dynasty --type all
%(prog)s health paper-dynasty
%(prog)s diagnostic paper-dynasty disk_usage
""",
)
subparsers = parser.add_subparsers(dest="command", required=True)
# docker-status
p_docker = subparsers.add_parser(
"docker-status", help="Get Docker container status"
)
p_docker.add_argument("server", help="Server identifier")
p_docker.add_argument("--container", "-c", help="Specific container name")
# docker-logs
p_logs = subparsers.add_parser("docker-logs", help="Get Docker container logs")
p_logs.add_argument("server", help="Server identifier")
p_logs.add_argument("container", help="Container name")
p_logs.add_argument("--lines", "-n", type=int, default=100, help="Number of lines")
p_logs.add_argument("--filter", "-f", dest="log_filter", help="Grep filter pattern")
# docker-restart
p_restart = subparsers.add_parser("docker-restart", help="Restart Docker container")
p_restart.add_argument("server", help="Server identifier")
p_restart.add_argument("container", help="Container name")
# metrics
p_metrics = subparsers.add_parser("metrics", help="Get system metrics")
p_metrics.add_argument("server", help="Server identifier")
p_metrics.add_argument(
"--type",
"-t",
default="all",
choices=["cpu", "memory", "disk", "network", "all"],
help="Metric type",
)
# logs
p_syslogs = subparsers.add_parser("logs", help="Read system logs")
p_syslogs.add_argument("server", help="Server identifier")
p_syslogs.add_argument(
"--type",
"-t",
default="system",
choices=["system", "docker", "application", "custom"],
help="Log type",
)
p_syslogs.add_argument(
"--lines", "-n", type=int, default=100, help="Number of lines"
)
p_syslogs.add_argument(
"--filter", "-f", dest="log_filter", help="Grep filter pattern"
)
p_syslogs.add_argument("--path", help="Custom log path (for type=custom)")
# health
p_health = subparsers.add_parser("health", help="Quick health check")
p_health.add_argument("server", help="Server identifier")
# diagnostic
p_diag = subparsers.add_parser("diagnostic", help="Run whitelisted diagnostic")
p_diag.add_argument("server", help="Server identifier")
p_diag.add_argument("diagnostic_cmd", help="Command from whitelist")
p_diag.add_argument(
"--params", "-p", help="JSON parameters for command substitution"
)
args = parser.parse_args()
client = ServerDiagnostics()
if args.command == "docker-status":
result = client.get_docker_status(args.server, args.container)
elif args.command == "docker-logs":
result = client.docker_logs(
args.server, args.container, args.lines, args.log_filter
)
elif args.command == "docker-restart":
result = client.docker_restart(args.server, args.container)
elif args.command == "metrics":
result = client.get_metrics(args.server, args.type)
elif args.command == "logs":
result = client.read_logs(
args.server, args.type, args.lines, args.log_filter, args.path
)
elif args.command == "health":
result = client.quick_health_check(args.server)
elif args.command == "diagnostic":
params = json.loads(args.params) if args.params else None
result = client.run_diagnostic(args.server, args.diagnostic_cmd, params)
print(client.to_json(result))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,72 @@
# Server Diagnostics Configuration
# Used by client.py for server inventory and security constraints
# Server inventory - SSH connection details
servers:
paper-dynasty:
hostname: 10.10.0.88
ssh_user: cal
ssh_key: ~/.ssh/claude_diagnostics_key
description: "Paper Dynasty Discord bots and services"
# Docker containers to monitor
# restart_allowed: false prevents automatic remediation
docker_containers:
- name: paper-dynasty_discord-app_1
critical: true
restart_allowed: true
description: "Paper Dynasty Discord bot"
- name: paper-dynasty_db_1
critical: true
restart_allowed: true
description: "Paper Dynasty PostgreSQL database"
- name: paper-dynasty_adminer_1
critical: false
restart_allowed: true
description: "Database admin UI"
- name: sba-website_sba-web_1
critical: true
restart_allowed: true
description: "SBA website"
- name: sba-ghost_sba-ghost_1
critical: false
restart_allowed: true
description: "SBA Ghost CMS"
# Whitelisted diagnostic commands
diagnostic_commands:
disk_usage: "df -h"
memory_usage: "free -h"
cpu_usage: "top -bn1 | head -20"
cpu_load: "uptime"
process_list: "ps aux --sort=-%mem | head -20"
network_status: "ss -tuln"
docker_ps: "docker ps -a --format 'table {{.Names}}\\t{{.Status}}\\t{{.Ports}}'"
docker_stats: "docker stats --no-stream --format 'table {{.Name}}\\t{{.CPUPerc}}\\t{{.MemUsage}}'"
journal_errors: "journalctl -p err -n 50 --no-pager"
# Remediation commands (low-risk only)
remediation_commands:
docker_restart: "docker restart {container}"
docker_logs: "docker logs --tail 500 {container}"
# DENIED patterns - commands containing these will be rejected
denied_patterns:
- "rm -rf"
- "rm -r /"
- "dd if="
- "mkfs"
- ":(){:|:&};:"
- "shutdown"
- "reboot"
- "init 0"
- "init 6"
- "systemctl stop"
- "> /dev/sd"
- "chmod 777"
- "wget|sh"
- "curl|sh"

View File

@ -0,0 +1 @@
pyyaml>=6.0

View File

@ -0,0 +1,26 @@
{
"permissions": {
"allow": [
"Bash(python3 ~/.claude/skills/server-diagnostics/client.py:*)",
"Bash(ssh -i ~/.ssh/claude_diagnostics_key:*)",
"Read(~/.claude/skills/**)",
"Read(~/.claude/logs/**)",
"Glob(*)",
"Grep(*)"
],
"deny": [
"Bash(rm -rf:*)",
"Bash(rm -r /:*)",
"Bash(dd:*)",
"Bash(mkfs:*)",
"Bash(shutdown:*)",
"Bash(reboot:*)",
"Bash(*> /dev/sd*)",
"Bash(chmod 777:*)",
"Bash(*|sh)",
"Bash(*curl*|*bash*)",
"Bash(*wget*|*bash*)"
]
},
"model": "sonnet"
}

19
tdarr/archive/README.md Normal file
View File

@ -0,0 +1,19 @@
# Legacy Tdarr Scripts
## tdarr_monitor_local_node.py
Full-featured Tdarr monitoring script (~1200 lines) built for when the local workstation (nobara-pc) ran as an unmapped remote Tdarr node with GPU transcoding.
**Features:** Stuck job detection via cross-run state comparison (pickle file), automatic worker killing, Discord alerts, configurable thresholds, rotating log files.
**Why it existed:** The unmapped remote node architecture was prone to stuck jobs caused by network issues during file transfers between the remote node and server. The monitor ran every minute via cron to detect and kill stuck workers.
**Why it's archived:** Transcoding moved to ubuntu-manticore (10.10.0.226) as a local mapped node with shared NFS storage. No remote transfers means no stuck jobs. Tdarr manages its own workers natively. Archived February 2026.
## tdarr_file_monitor_local_node.py + tdarr-file-monitor-cron_local_node.sh
File completion monitor that watched the local Tdarr cache directory for finished `.mkv` transcodes and copied the smallest version to a backup location. The cron wrapper ran it every minute.
**Why it existed:** When the local workstation ran as an unmapped Tdarr node, completed transcodes landed in the local NVMe cache. This monitor detected completion (by tracking size stability) and kept the best copy.
**Why it's archived:** Same reason as above - mapped node on manticore writes directly to the shared NFS media mount. No local cache to monitor. Archived February 2026.

View File

@ -0,0 +1,6 @@
#!/bin/bash
# Cron job wrapper for Tdarr file monitor
# Add this to crontab with: * * * * * /mnt/NV2/Development/claude-home/monitoring/scripts/tdarr-file-monitor-cron.sh
cd /mnt/NV2/Development/claude-home/monitoring/scripts
/usr/bin/python3 /mnt/NV2/Development/claude-home/monitoring/scripts/tdarr_file_monitor.py

View File

@ -0,0 +1,286 @@
#!/usr/bin/env python3
"""
Tdarr File Monitor - Monitors Tdarr cache directory for completed .mkv files and copies them to backup location.
Detects file completion by monitoring size changes and always keeps the smallest version of duplicate files.
"""
import os
import shutil
import json
import time
import logging
from pathlib import Path
from dataclasses import dataclass, asdict
from typing import Dict, Optional
from datetime import datetime, timedelta
@dataclass
class FileState:
"""Tracks the state of a monitored file."""
path: str
size: int
last_modified: float
first_seen: float
last_size_change: float
check_count: int = 0
class TdarrFileMonitor:
"""Monitors Tdarr cache directory for completed .mkv files."""
def __init__(
self,
source_dir: str = "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/temp",
media_dir: str = "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/media",
dest_dir: str = "/mnt/NV2/tdarr-cache/manual-backup",
state_file: str = "/mnt/NV2/Development/claude-home/logs/tdarr_file_monitor_state.json",
completion_wait_seconds: int = 60,
log_file: str = "/mnt/NV2/Development/claude-home/logs/tdarr_file_monitor.log"
):
self.source_dir = Path(source_dir)
self.media_dir = Path(media_dir)
self.dest_dir = Path(dest_dir)
self.state_file = Path(state_file)
self.completion_wait_seconds = completion_wait_seconds
self.monitored_files: Dict[str, FileState] = {}
# Setup logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler(log_file),
logging.StreamHandler()
]
)
self.logger = logging.getLogger(f'{__name__}.TdarrFileMonitor')
# Ensure destination directory exists
self.dest_dir.mkdir(parents=True, exist_ok=True)
# Load previous state
self._load_state()
def _load_state(self) -> None:
"""Load monitored files state from disk."""
if self.state_file.exists():
try:
with open(self.state_file, 'r') as f:
data = json.load(f)
self.monitored_files = {
path: FileState(**file_data)
for path, file_data in data.items()
}
self.logger.info(f"Loaded state for {len(self.monitored_files)} monitored files")
except Exception as e:
self.logger.error(f"Failed to load state file: {e}")
self.monitored_files = {}
def _save_state(self) -> None:
"""Save monitored files state to disk."""
try:
with open(self.state_file, 'w') as f:
data = {path: asdict(state) for path, state in self.monitored_files.items()}
json.dump(data, f, indent=2)
except Exception as e:
self.logger.error(f"Failed to save state file: {e}")
def _scan_for_mkv_files(self) -> Dict[str, Path]:
"""Scan source directory for .mkv files in all subdirectories."""
mkv_files = {}
try:
for mkv_file in self.source_dir.rglob("*.mkv"):
if mkv_file.is_file():
mkv_files[str(mkv_file)] = mkv_file
except Exception as e:
self.logger.error(f"Error scanning source directory: {e}")
return mkv_files
def _get_file_info(self, file_path: Path) -> Optional[tuple]:
"""Get file size and modification time, return None if file doesn't exist or can't be accessed."""
try:
stat = file_path.stat()
return stat.st_size, stat.st_mtime
except (OSError, FileNotFoundError) as e:
self.logger.warning(f"Cannot access file {file_path}: {e}")
return None
def _validate_file_pair(self, temp_file_path: Path, temp_file_size: int) -> bool:
"""Validate that a matching file exists in media directory with exact same name and size."""
try:
# Search for matching file in media directory tree
for media_file in self.media_dir.rglob(temp_file_path.name):
if media_file.is_file():
media_file_info = self._get_file_info(media_file)
if media_file_info:
media_size, _ = media_file_info
if media_size == temp_file_size:
self.logger.debug(f"Found matching file: {temp_file_path.name} ({temp_file_size:,} bytes) in temp and media directories")
return True
else:
self.logger.debug(f"Size mismatch for {temp_file_path.name}: temp={temp_file_size:,}, media={media_size:,}")
# No matching file found
self.logger.info(f"No matching file found in media directory for {temp_file_path.name} ({temp_file_size:,} bytes)")
return False
except Exception as e:
self.logger.error(f"Error validating file pair for {temp_file_path.name}: {e}")
return False
def _is_file_complete(self, file_state: FileState, current_time: float) -> bool:
"""Check if file is complete based on size stability."""
stale_time = current_time - file_state.last_size_change
return stale_time >= self.completion_wait_seconds
def _should_copy_file(self, source_path: Path, dest_path: Path) -> bool:
"""Determine if we should copy the file (always keep smaller version)."""
if not dest_path.exists():
return True
source_size = source_path.stat().st_size
dest_size = dest_path.stat().st_size
if source_size < dest_size:
self.logger.info(f"Source file {source_path.name} ({source_size:,} bytes) is smaller than existing destination ({dest_size:,} bytes), will replace")
return True
else:
self.logger.info(f"Source file {source_path.name} ({source_size:,} bytes) is not smaller than existing destination ({dest_size:,} bytes), skipping")
return False
def _copy_file_with_retry(self, source_path: Path, dest_path: Path) -> bool:
"""Copy file with retry logic and cleanup on failure."""
temp_dest = dest_path.with_suffix(dest_path.suffix + '.tmp')
for attempt in range(2): # Try twice
try:
start_time = time.time()
self.logger.info(f"Attempt {attempt + 1}: Copying {source_path.name} ({source_path.stat().st_size:,} bytes)")
# Copy to temporary file first
shutil.copy2(source_path, temp_dest)
# Verify copy completed successfully
if temp_dest.stat().st_size != source_path.stat().st_size:
raise Exception(f"Copy verification failed: size mismatch")
# Move temp file to final destination
if dest_path.exists():
dest_path.unlink() # Remove existing file
temp_dest.rename(dest_path)
copy_time = time.time() - start_time
final_size = dest_path.stat().st_size
self.logger.info(f"Successfully copied {source_path.name} ({final_size:,} bytes) in {copy_time:.2f}s")
return True
except Exception as e:
self.logger.error(f"Copy attempt {attempt + 1} failed for {source_path.name}: {e}")
# Cleanup temporary file if it exists
if temp_dest.exists():
try:
temp_dest.unlink()
except Exception as cleanup_error:
self.logger.error(f"Failed to cleanup temp file {temp_dest}: {cleanup_error}")
if attempt == 1: # Last attempt failed
self.logger.error(f"All copy attempts failed for {source_path.name}, giving up")
return False
else:
time.sleep(5) # Wait before retry
return False
def run_check(self) -> None:
"""Run a single monitoring check cycle."""
current_time = time.time()
self.logger.info("Starting monitoring check cycle")
# Scan for current .mkv files
current_files = self._scan_for_mkv_files()
self.logger.info(f"Found {len(current_files)} .mkv files in source directory")
# Remove files from monitoring that no longer exist
missing_files = set(self.monitored_files.keys()) - set(current_files.keys())
for missing_file in missing_files:
self.logger.info(f"File no longer exists, removing from monitoring: {Path(missing_file).name}")
del self.monitored_files[missing_file]
# Process each current file
files_to_copy = []
for file_path_str, file_path in current_files.items():
file_info = self._get_file_info(file_path)
if not file_info:
continue
current_size, current_mtime = file_info
# Update or create file state
if file_path_str in self.monitored_files:
file_state = self.monitored_files[file_path_str]
file_state.check_count += 1
# Check if size changed
if current_size != file_state.size:
file_state.size = current_size
file_state.last_size_change = current_time
self.logger.debug(f"Size changed for {file_path.name}: {current_size:,} bytes")
file_state.last_modified = current_mtime
else:
# New file discovered - validate before tracking
if not self._validate_file_pair(file_path, current_size):
# File doesn't have a matching pair in media directory, skip tracking
continue
file_state = FileState(
path=file_path_str,
size=current_size,
last_modified=current_mtime,
first_seen=current_time,
last_size_change=current_time,
check_count=1
)
self.monitored_files[file_path_str] = file_state
self.logger.info(f"Started monitoring validated file: {file_path.name} ({current_size:,} bytes)")
# Log current state
stale_time = current_time - file_state.last_size_change
self.logger.info(f"Checking {file_path.name}: {current_size:,} bytes, stale for {stale_time:.1f}s (checks: {file_state.check_count})")
# Check if file is complete
if self._is_file_complete(file_state, current_time):
dest_path = self.dest_dir / file_path.name
if self._should_copy_file(file_path, dest_path):
files_to_copy.append((file_path, dest_path, file_state))
# Copy completed files
for source_path, dest_path, file_state in files_to_copy:
self.logger.info(f"File appears complete: {source_path.name} (stable for {current_time - file_state.last_size_change:.1f}s)")
if self._copy_file_with_retry(source_path, dest_path):
# Remove from monitoring after successful copy
del self.monitored_files[str(source_path)]
self.logger.info(f"Successfully processed and removed from monitoring: {source_path.name}")
else:
self.logger.error(f"Failed to copy {source_path.name}, will continue monitoring")
# Save state
self._save_state()
self.logger.info(f"Check cycle completed, monitoring {len(self.monitored_files)} files")
def main():
"""Main entry point for the script."""
monitor = TdarrFileMonitor()
monitor.run_check()
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff