- Document hybrid storage strategy for server (local DB/configs, network backups) - Add production unmapped node configuration with NVMe cache optimization - Document Docker→Podman migration benefits and GPU improvements - Update cache paths to reflect actual NVMe location (/mnt/NV2/tdarr-cache) - Add gaming-aware scheduler and enhanced monitoring system documentation - Update configuration file paths to current production locations - Document 100x database performance improvement with local storage 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
9.8 KiB
9.8 KiB
Tdarr Distributed Transcoding Pattern
Overview
Tdarr distributed transcoding with unmapped nodes provides optimal performance for enterprise-scale video processing across multiple machines.
Architecture Pattern
Unmapped Node Deployment (Recommended)
┌─────────────────┐ ┌──────────────────────────────────┐
│ Tdarr Server │ │ Unmapped Nodes │
│ │ │ ┌─────────┐ ┌─────────┐ │
│ - Web Interface│◄──►│ │ Node 1 │ │ Node 2 │ ... │
│ - Job Queue │ │ │ GPU+CPU │ │ GPU+CPU │ │
│ - File Mgmt │ │ │NVMe Cache│ │NVMe Cache│ │
│ │ │ └─────────┘ └─────────┘ │
└─────────────────┘ └──────────────────────────────────┘
│ │
└──────── Shared Storage ──────┘
(NAS/SAN for media files)
Key Components
- Server: Centralizes job management and web interface
- Unmapped Nodes: Independent transcoding with local cache
- Shared Storage: Source and final file repository
Configuration Templates
Server Configuration (Optimized)
# docker-compose.yml - Hybrid Storage Strategy
version: "3.4"
services:
tdarr:
container_name: tdarr
image: ghcr.io/haveagitgat/tdarr:latest
restart: unless-stopped
network_mode: bridge
ports:
- 8265:8265 # webUI port
- 8266:8266 # server port
environment:
- TZ=America/Chicago
- PUID=0
- PGID=0
- UMASK_SET=002
- serverIP=0.0.0.0
- serverPort=8266
- webUIPort=8265
- internalNode=false # Disable for distributed setup
- inContainer=true
- ffmpegVersion=6
- nodeName=docker-server
volumes:
# Hybrid storage strategy - Local for performance, Network for persistence
- ./tdarr/server:/app/server # Local: Database, configs, logs
- ./tdarr/configs:/app/configs # Local: Fast config access
- ./tdarr/logs:/app/logs # Local: Logging performance
- /mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups # Network: Backups only
# Media and cache (when using mapped nodes)
- /mnt/truenas-share:/media # Network: Source media
- /mnt/truenas-share/tdarr/tdarr-cache:/temp # Network: Shared cache (mapped nodes only)
Unmapped Node Configuration (Production)
#!/bin/bash
# Tdarr Unmapped Node with GPU Support - NVMe Cache Optimization
# Production script: scripts/tdarr/start-tdarr-gpu-podman-clean.sh
CONTAINER_NAME="tdarr-node-gpu-unmapped"
SERVER_IP="10.10.0.43"
SERVER_PORT="8266"
NODE_NAME="nobara-pc-gpu-unmapped"
# Clean container management
if podman ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
podman stop "${CONTAINER_NAME}" 2>/dev/null || true
podman rm "${CONTAINER_NAME}" 2>/dev/null || true
fi
# Production unmapped node with optimized cache
podman run -d --name "${CONTAINER_NAME}" \
--gpus all \
--restart unless-stopped \
-e TZ=America/Chicago \
-e UMASK_SET=002 \
-e nodeName="${NODE_NAME}" \
-e serverIP="${SERVER_IP}" \
-e serverPort="${SERVER_PORT}" \
-e nodeType=unmapped \
-e inContainer=true \
-e ffmpegVersion=6 \
-e logLevel=DEBUG \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
-v "/mnt/NV2/tdarr-cache:/cache" \ # NVMe cache (3-7GB/s)
-v "/mnt/media:/app/unmappedNodeCache/nobara-pc-gpu-unmapped/media" \
ghcr.io/haveagitgat/tdarr_node:latest
File Transfer Optimizations
Hybrid Storage Strategy (Server)
The server uses a hybrid approach balancing performance and reliability:
# Local storage (SSD/NVMe) - High Performance Operations
./tdarr/server:/app/server # Database - frequent read/write
./tdarr/configs:/app/configs # Config files - startup performance
./tdarr/logs:/app/logs # Log files - continuous writing
# Network storage (NAS) - Persistence & Backup
/mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups # Infrequent access
Benefits:
- Database performance: Local SQLite operations (100x faster than network)
- Log performance: Eliminates network I/O bottleneck for continuous logging
- Reliability: Critical backups stored on redundant NAS storage
- Config speed: Fast server startup with local configuration files
Container Platform Migration: Docker → Podman
Advantages of Podman for Tdarr:
# Enhanced GPU support
--gpus all # Improved NVIDIA integration
-e NVIDIA_DRIVER_CAPABILITIES=all # Full GPU access
-e NVIDIA_VISIBLE_DEVICES=all # All GPU visibility
# Better resource management
--restart unless-stopped # Smarter restart policies
# Rootless containers (when needed) # Enhanced security
Migration Benefits:
- GPU reliability: Better NVIDIA container integration
- Resource isolation: Improved container resource management
- System integration: Better integration with systemd and cgroups
- Performance: Reduced overhead compared to Docker daemon
Performance Optimization
Cache Storage Strategy (Updated)
# Production cache storage hierarchy (NVMe optimized)
/mnt/NV2/tdarr-cache/ # NVMe SSD (3-7GB/s) - PRODUCTION
├── tdarr-workDir-{jobId}/ # Active transcoding
├── download/ # Source file staging (API downloads)
└── upload/ # Result file staging (API uploads)
# Alternative configurations:
/dev/shm/tdarr-cache/ # RAM disk (fastest, volatile, limited size)
/mnt/truenas-share/tdarr/tdarr-cache/ # Network cache (mapped nodes only)
# Performance comparison:
# NVMe cache: 3-7GB/s (unmapped nodes - RECOMMENDED)
# Network cache: 100MB/s (mapped nodes - legacy)
# RAM cache: 10GB/s+ (limited by available RAM)
Network I/O Pattern
Optimized Workflow:
1. 📥 Download source (once): NAS → Local NVMe
2. ⚡ Transcode: Local NVMe → Local NVMe
3. 📤 Upload result (once): Local NVMe → NAS
vs Legacy Mapped Workflow:
1. 🐌 Read source: NAS → Node (streaming)
2. 🐌 Write temp: Node → NAS (streaming)
3. 🐌 Read temp: NAS → Node (streaming)
4. 🐌 Write final: Node → NAS (streaming)
Scaling Patterns
Horizontal Scaling
# Multiple nodes with load balancing
nodes:
- name: "gpu-node-1" # RTX 4090 + NVMe
role: "heavy-transcode"
- name: "gpu-node-2" # RTX 3080 + NVMe
role: "standard-transcode"
- name: "cpu-node-1" # Multi-core + SSD
role: "audio-processing"
Resource Specialization
# GPU-optimized node
-e hardwareEncoding=true
-e nvencTemporalAQ=1
-e processes_GPU=2
# CPU-optimized node
-e hardwareEncoding=false
-e processes_CPU=8
-e ffmpegThreads=16
Monitoring and Operations
Health Checks
# Node connectivity
curl -f http://server:8266/api/v2/status || exit 1
# Cache usage monitoring
df -h /mnt/nvme/tdarr-cache
du -sh /mnt/nvme/tdarr-cache/*
# Performance metrics
podman stats tdarr-node-1
Log Analysis
# Node registration
podman logs tdarr-node-1 | grep "Node connected"
# Transfer speeds
podman logs tdarr-node-1 | grep -E "(Download|Upload).*MB/s"
# Transcode performance
podman logs tdarr-node-1 | grep -E "fps=.*"
Security Considerations
Network Access
- Server requires incoming connections on ports 8265/8266
- Nodes require outbound access to server
- Consider VPN for cross-site deployments
File Permissions
# Ensure consistent UID/GID across nodes
-e PUID=1000
-e PGID=1000
# Cache directory permissions
chown -R 1000:1000 /mnt/nvme/tdarr-cache
chmod 755 /mnt/nvme/tdarr-cache
Production Enhancements
Gaming-Aware Scheduler
For GPU nodes that serve dual purposes (gaming + transcoding):
# Automated scheduler with gaming detection
scripts/tdarr/tdarr-schedule-manager.sh install
# Configure time windows (example: night-only transcoding)
scripts/tdarr/tdarr-schedule-manager.sh preset night-only # 10PM-7AM only
Features:
- Automatic GPU conflict prevention: Detects Steam, gaming processes, GPU >15% usage
- Configurable time windows:
"22-07:daily"(10PM-7AM),"09-17:1-5"(work hours) - Real-time monitoring: 1-minute cron checks with instant gaming response
- Automated cleanup: Removes abandoned temp directories every 6 hours
- Zero-intervention operation: Stops/starts Tdarr automatically based on rules
Benefits:
- Gaming priority: Never interferes with gaming sessions
- Resource optimization: Maximizes transcoding during off-hours
- System stability: Prevents GPU contention and system slowdowns
- Maintenance-free: Handles cleanup and scheduling without user intervention
Enhanced Monitoring System
Script: scripts/monitoring/tdarr-timeout-monitor.sh
- Staging timeout detection: Monitors for download failures and cleanup issues
- Discord notifications: Professional alerts with user pings for critical issues
- Automatic recovery: Cleans up stuck work directories and partial downloads
- Log management: Timestamped logs with automatic rotation
Related References
- Troubleshooting:
reference/docker/tdarr-troubleshooting.md - Gaming Scheduler:
scripts/tdarr/README.md - Automation Scripts:
scripts/tdarr/(production-ready node management) - Performance:
reference/docker/nvidia-troubleshooting.md