Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
289 lines
9.8 KiB
Markdown
289 lines
9.8 KiB
Markdown
# Tdarr Distributed Transcoding Pattern
|
|
|
|
## Overview
|
|
Tdarr distributed transcoding with unmapped nodes provides optimal performance for enterprise-scale video processing across multiple machines.
|
|
|
|
## Architecture Pattern
|
|
|
|
### Unmapped Node Deployment (Recommended)
|
|
```
|
|
┌─────────────────┐ ┌──────────────────────────────────┐
|
|
│ Tdarr Server │ │ Unmapped Nodes │
|
|
│ │ │ ┌─────────┐ ┌─────────┐ │
|
|
│ - Web Interface│◄──►│ │ Node 1 │ │ Node 2 │ ... │
|
|
│ - Job Queue │ │ │ GPU+CPU │ │ GPU+CPU │ │
|
|
│ - File Mgmt │ │ │NVMe Cache│ │NVMe Cache│ │
|
|
│ │ │ └─────────┘ └─────────┘ │
|
|
└─────────────────┘ └──────────────────────────────────┘
|
|
│ │
|
|
└──────── Shared Storage ──────┘
|
|
(NAS/SAN for media files)
|
|
```
|
|
|
|
### Key Components
|
|
- **Server**: Centralizes job management and web interface
|
|
- **Unmapped Nodes**: Independent transcoding with local cache
|
|
- **Shared Storage**: Source and final file repository
|
|
|
|
## Configuration Templates
|
|
|
|
### Server Configuration (Optimized)
|
|
```yaml
|
|
# docker-compose.yml - Hybrid Storage Strategy
|
|
version: "3.4"
|
|
services:
|
|
tdarr:
|
|
container_name: tdarr
|
|
image: ghcr.io/haveagitgat/tdarr:latest
|
|
restart: unless-stopped
|
|
network_mode: bridge
|
|
ports:
|
|
- 8265:8265 # webUI port
|
|
- 8266:8266 # server port
|
|
environment:
|
|
- TZ=America/Chicago
|
|
- PUID=0
|
|
- PGID=0
|
|
- UMASK_SET=002
|
|
- serverIP=0.0.0.0
|
|
- serverPort=8266
|
|
- webUIPort=8265
|
|
- internalNode=false # Disable for distributed setup
|
|
- inContainer=true
|
|
- ffmpegVersion=6
|
|
- nodeName=docker-server
|
|
volumes:
|
|
# Hybrid storage strategy - Local for performance, Network for persistence
|
|
- ./tdarr/server:/app/server # Local: Database, configs, logs
|
|
- ./tdarr/configs:/app/configs # Local: Fast config access
|
|
- ./tdarr/logs:/app/logs # Local: Logging performance
|
|
- /mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups # Network: Backups only
|
|
|
|
# Media and cache (when using mapped nodes)
|
|
- /mnt/truenas-share:/media # Network: Source media
|
|
- /mnt/truenas-share/tdarr/tdarr-cache:/temp # Network: Shared cache (mapped nodes only)
|
|
```
|
|
|
|
### Unmapped Node Configuration (Production)
|
|
```bash
|
|
#!/bin/bash
|
|
# Tdarr Unmapped Node with GPU Support - NVMe Cache Optimization
|
|
# Production script: scripts/tdarr/start-tdarr-gpu-podman-clean.sh
|
|
|
|
CONTAINER_NAME="tdarr-node-gpu-unmapped"
|
|
SERVER_IP="10.10.0.43"
|
|
SERVER_PORT="8266"
|
|
NODE_NAME="nobara-pc-gpu-unmapped"
|
|
|
|
# Clean container management
|
|
if podman ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
|
|
podman stop "${CONTAINER_NAME}" 2>/dev/null || true
|
|
podman rm "${CONTAINER_NAME}" 2>/dev/null || true
|
|
fi
|
|
|
|
# Production unmapped node with optimized cache
|
|
podman run -d --name "${CONTAINER_NAME}" \
|
|
--gpus all \
|
|
--restart unless-stopped \
|
|
-e TZ=America/Chicago \
|
|
-e UMASK_SET=002 \
|
|
-e nodeName="${NODE_NAME}" \
|
|
-e serverIP="${SERVER_IP}" \
|
|
-e serverPort="${SERVER_PORT}" \
|
|
-e nodeType=unmapped \
|
|
-e inContainer=true \
|
|
-e ffmpegVersion=6 \
|
|
-e logLevel=DEBUG \
|
|
-e NVIDIA_DRIVER_CAPABILITIES=all \
|
|
-e NVIDIA_VISIBLE_DEVICES=all \
|
|
-v "/mnt/NV2/tdarr-cache:/cache" \ # NVMe cache (3-7GB/s)
|
|
-v "/mnt/media:/app/unmappedNodeCache/nobara-pc-gpu-unmapped/media" \
|
|
ghcr.io/haveagitgat/tdarr_node:latest
|
|
```
|
|
|
|
## File Transfer Optimizations
|
|
|
|
### Hybrid Storage Strategy (Server)
|
|
The server uses a hybrid approach balancing performance and reliability:
|
|
|
|
```bash
|
|
# Local storage (SSD/NVMe) - High Performance Operations
|
|
./tdarr/server:/app/server # Database - frequent read/write
|
|
./tdarr/configs:/app/configs # Config files - startup performance
|
|
./tdarr/logs:/app/logs # Log files - continuous writing
|
|
|
|
# Network storage (NAS) - Persistence & Backup
|
|
/mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups # Infrequent access
|
|
```
|
|
|
|
**Benefits:**
|
|
- **Database performance**: Local SQLite operations (100x faster than network)
|
|
- **Log performance**: Eliminates network I/O bottleneck for continuous logging
|
|
- **Reliability**: Critical backups stored on redundant NAS storage
|
|
- **Config speed**: Fast server startup with local configuration files
|
|
|
|
### Container Platform Migration: Docker → Podman
|
|
|
|
**Advantages of Podman for Tdarr:**
|
|
```bash
|
|
# Enhanced GPU support
|
|
--gpus all # Improved NVIDIA integration
|
|
-e NVIDIA_DRIVER_CAPABILITIES=all # Full GPU access
|
|
-e NVIDIA_VISIBLE_DEVICES=all # All GPU visibility
|
|
|
|
# Better resource management
|
|
--restart unless-stopped # Smarter restart policies
|
|
# Rootless containers (when needed) # Enhanced security
|
|
```
|
|
|
|
**Migration Benefits:**
|
|
- **GPU reliability**: Better NVIDIA container integration
|
|
- **Resource isolation**: Improved container resource management
|
|
- **System integration**: Better integration with systemd and cgroups
|
|
- **Performance**: Reduced overhead compared to Docker daemon
|
|
|
|
## Performance Optimization
|
|
|
|
### Cache Storage Strategy (Updated)
|
|
```bash
|
|
# Production cache storage hierarchy (NVMe optimized)
|
|
/mnt/NV2/tdarr-cache/ # NVMe SSD (3-7GB/s) - PRODUCTION
|
|
├── tdarr-workDir-{jobId}/ # Active transcoding
|
|
├── download/ # Source file staging (API downloads)
|
|
└── upload/ # Result file staging (API uploads)
|
|
|
|
# Alternative configurations:
|
|
/dev/shm/tdarr-cache/ # RAM disk (fastest, volatile, limited size)
|
|
/mnt/truenas-share/tdarr/tdarr-cache/ # Network cache (mapped nodes only)
|
|
|
|
# Performance comparison:
|
|
# NVMe cache: 3-7GB/s (unmapped nodes - RECOMMENDED)
|
|
# Network cache: 100MB/s (mapped nodes - legacy)
|
|
# RAM cache: 10GB/s+ (limited by available RAM)
|
|
```
|
|
|
|
### Network I/O Pattern
|
|
```
|
|
Optimized Workflow:
|
|
1. 📥 Download source (once): NAS → Local NVMe
|
|
2. ⚡ Transcode: Local NVMe → Local NVMe
|
|
3. 📤 Upload result (once): Local NVMe → NAS
|
|
|
|
vs Legacy Mapped Workflow:
|
|
1. 🐌 Read source: NAS → Node (streaming)
|
|
2. 🐌 Write temp: Node → NAS (streaming)
|
|
3. 🐌 Read temp: NAS → Node (streaming)
|
|
4. 🐌 Write final: Node → NAS (streaming)
|
|
```
|
|
|
|
## Scaling Patterns
|
|
|
|
### Horizontal Scaling
|
|
```yaml
|
|
# Multiple nodes with load balancing
|
|
nodes:
|
|
- name: "gpu-node-1" # RTX 4090 + NVMe
|
|
role: "heavy-transcode"
|
|
- name: "gpu-node-2" # RTX 3080 + NVMe
|
|
role: "standard-transcode"
|
|
- name: "cpu-node-1" # Multi-core + SSD
|
|
role: "audio-processing"
|
|
```
|
|
|
|
### Resource Specialization
|
|
```bash
|
|
# GPU-optimized node
|
|
-e hardwareEncoding=true
|
|
-e nvencTemporalAQ=1
|
|
-e processes_GPU=2
|
|
|
|
# CPU-optimized node
|
|
-e hardwareEncoding=false
|
|
-e processes_CPU=8
|
|
-e ffmpegThreads=16
|
|
```
|
|
|
|
## Monitoring and Operations
|
|
|
|
### Health Checks
|
|
```bash
|
|
# Node connectivity
|
|
curl -f http://server:8266/api/v2/status || exit 1
|
|
|
|
# Cache usage monitoring
|
|
df -h /mnt/nvme/tdarr-cache
|
|
du -sh /mnt/nvme/tdarr-cache/*
|
|
|
|
# Performance metrics
|
|
podman stats tdarr-node-1
|
|
```
|
|
|
|
### Log Analysis
|
|
```bash
|
|
# Node registration
|
|
podman logs tdarr-node-1 | grep "Node connected"
|
|
|
|
# Transfer speeds
|
|
podman logs tdarr-node-1 | grep -E "(Download|Upload).*MB/s"
|
|
|
|
# Transcode performance
|
|
podman logs tdarr-node-1 | grep -E "fps=.*"
|
|
```
|
|
|
|
## Security Considerations
|
|
|
|
### Network Access
|
|
- Server requires incoming connections on ports 8265/8266
|
|
- Nodes require outbound access to server
|
|
- Consider VPN for cross-site deployments
|
|
|
|
### File Permissions
|
|
```bash
|
|
# Ensure consistent UID/GID across nodes
|
|
-e PUID=1000
|
|
-e PGID=1000
|
|
|
|
# Cache directory permissions
|
|
chown -R 1000:1000 /mnt/nvme/tdarr-cache
|
|
chmod 755 /mnt/nvme/tdarr-cache
|
|
```
|
|
|
|
## Production Enhancements
|
|
|
|
### Gaming-Aware Scheduler
|
|
For GPU nodes that serve dual purposes (gaming + transcoding):
|
|
|
|
```bash
|
|
# Automated scheduler with gaming detection
|
|
scripts/tdarr/tdarr-schedule-manager.sh install
|
|
|
|
# Configure time windows (example: night-only transcoding)
|
|
scripts/tdarr/tdarr-schedule-manager.sh preset night-only # 10PM-7AM only
|
|
```
|
|
|
|
**Features:**
|
|
- **Automatic GPU conflict prevention**: Detects Steam, gaming processes, GPU >15% usage
|
|
- **Configurable time windows**: `"22-07:daily"` (10PM-7AM), `"09-17:1-5"` (work hours)
|
|
- **Real-time monitoring**: 1-minute cron checks with instant gaming response
|
|
- **Automated cleanup**: Removes abandoned temp directories every 6 hours
|
|
- **Zero-intervention operation**: Stops/starts Tdarr automatically based on rules
|
|
|
|
**Benefits:**
|
|
- **Gaming priority**: Never interferes with gaming sessions
|
|
- **Resource optimization**: Maximizes transcoding during off-hours
|
|
- **System stability**: Prevents GPU contention and system slowdowns
|
|
- **Maintenance-free**: Handles cleanup and scheduling without user intervention
|
|
|
|
### Enhanced Monitoring System
|
|
**Script**: `scripts/monitoring/tdarr-timeout-monitor.sh`
|
|
|
|
- **Staging timeout detection**: Monitors for download failures and cleanup issues
|
|
- **Discord notifications**: Professional alerts with user pings for critical issues
|
|
- **Automatic recovery**: Cleans up stuck work directories and partial downloads
|
|
- **Log management**: Timestamped logs with automatic rotation
|
|
|
|
## Related References
|
|
- **Troubleshooting**: `reference/docker/tdarr-troubleshooting.md`
|
|
- **Gaming Scheduler**: `scripts/tdarr/README.md`
|
|
- **Automation Scripts**: `scripts/tdarr/` (production-ready node management)
|
|
- **Performance**: `reference/docker/nvidia-troubleshooting.md` |