claude-home/docker/examples/distributed-transcoding.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

10 KiB

title description type domain tags
Tdarr Distributed Transcoding Architecture and configuration for Tdarr distributed transcoding with unmapped nodes, NVMe cache optimization, hybrid storage strategy, gaming-aware scheduling, and horizontal scaling patterns. guide docker
tdarr
distributed
transcoding
podman
gpu
nvme
cache
gaming-scheduler
monitoring

Tdarr Distributed Transcoding Pattern

Overview

Tdarr distributed transcoding with unmapped nodes provides optimal performance for enterprise-scale video processing across multiple machines.

Architecture Pattern

┌─────────────────┐    ┌──────────────────────────────────┐
│   Tdarr Server  │    │        Unmapped Nodes            │
│                 │    │  ┌─────────┐ ┌─────────┐         │
│  - Web Interface│◄──►│  │ Node 1  │ │ Node 2  │ ...     │
│  - Job Queue    │    │  │ GPU+CPU │ │ GPU+CPU │         │
│  - File Mgmt    │    │  │NVMe Cache│ │NVMe Cache│         │
│                 │    │  └─────────┘ └─────────┘         │
└─────────────────┘    └──────────────────────────────────┘
         │                              │
         └──────── Shared Storage ──────┘
              (NAS/SAN for media files)

Key Components

  • Server: Centralizes job management and web interface
  • Unmapped Nodes: Independent transcoding with local cache
  • Shared Storage: Source and final file repository

Configuration Templates

Server Configuration (Optimized)

# docker-compose.yml - Hybrid Storage Strategy
version: "3.4"
services:
  tdarr:
    container_name: tdarr
    image: ghcr.io/haveagitgat/tdarr:latest
    restart: unless-stopped
    network_mode: bridge
    ports:
      - 8265:8265 # webUI port
      - 8266:8266 # server port
    environment:
      - TZ=America/Chicago
      - PUID=0
      - PGID=0
      - UMASK_SET=002
      - serverIP=0.0.0.0
      - serverPort=8266
      - webUIPort=8265
      - internalNode=false  # Disable for distributed setup
      - inContainer=true
      - ffmpegVersion=6
      - nodeName=docker-server
    volumes:
      # Hybrid storage strategy - Local for performance, Network for persistence
      - ./tdarr/server:/app/server           # Local: Database, configs, logs
      - ./tdarr/configs:/app/configs         # Local: Fast config access
      - ./tdarr/logs:/app/logs               # Local: Logging performance
      - /mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups  # Network: Backups only
      
      # Media and cache (when using mapped nodes)
      - /mnt/truenas-share:/media            # Network: Source media
      - /mnt/truenas-share/tdarr/tdarr-cache:/temp  # Network: Shared cache (mapped nodes only)

Unmapped Node Configuration (Production)

#!/bin/bash
# Tdarr Unmapped Node with GPU Support - NVMe Cache Optimization  
# Production script: scripts/tdarr/start-tdarr-gpu-podman-clean.sh

CONTAINER_NAME="tdarr-node-gpu-unmapped"
SERVER_IP="10.10.0.43"
SERVER_PORT="8266"
NODE_NAME="nobara-pc-gpu-unmapped"

# Clean container management
if podman ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
    podman stop "${CONTAINER_NAME}" 2>/dev/null || true
    podman rm "${CONTAINER_NAME}" 2>/dev/null || true
fi

# Production unmapped node with optimized cache
podman run -d --name "${CONTAINER_NAME}" \
    --gpus all \
    --restart unless-stopped \
    -e TZ=America/Chicago \
    -e UMASK_SET=002 \
    -e nodeName="${NODE_NAME}" \
    -e serverIP="${SERVER_IP}" \
    -e serverPort="${SERVER_PORT}" \
    -e nodeType=unmapped \
    -e inContainer=true \
    -e ffmpegVersion=6 \
    -e logLevel=DEBUG \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -v "/mnt/NV2/tdarr-cache:/cache" \                    # NVMe cache (3-7GB/s)
    -v "/mnt/media:/app/unmappedNodeCache/nobara-pc-gpu-unmapped/media" \
    ghcr.io/haveagitgat/tdarr_node:latest

File Transfer Optimizations

Hybrid Storage Strategy (Server)

The server uses a hybrid approach balancing performance and reliability:

# Local storage (SSD/NVMe) - High Performance Operations
./tdarr/server:/app/server           # Database - frequent read/write
./tdarr/configs:/app/configs         # Config files - startup performance  
./tdarr/logs:/app/logs               # Log files - continuous writing

# Network storage (NAS) - Persistence & Backup
/mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups  # Infrequent access

Benefits:

  • Database performance: Local SQLite operations (100x faster than network)
  • Log performance: Eliminates network I/O bottleneck for continuous logging
  • Reliability: Critical backups stored on redundant NAS storage
  • Config speed: Fast server startup with local configuration files

Container Platform Migration: Docker → Podman

Advantages of Podman for Tdarr:

# Enhanced GPU support
--gpus all                           # Improved NVIDIA integration
-e NVIDIA_DRIVER_CAPABILITIES=all    # Full GPU access
-e NVIDIA_VISIBLE_DEVICES=all        # All GPU visibility

# Better resource management  
--restart unless-stopped             # Smarter restart policies
# Rootless containers (when needed)  # Enhanced security

Migration Benefits:

  • GPU reliability: Better NVIDIA container integration
  • Resource isolation: Improved container resource management
  • System integration: Better integration with systemd and cgroups
  • Performance: Reduced overhead compared to Docker daemon

Performance Optimization

Cache Storage Strategy (Updated)

# Production cache storage hierarchy (NVMe optimized)
/mnt/NV2/tdarr-cache/               # NVMe SSD (3-7GB/s) - PRODUCTION
├── tdarr-workDir-{jobId}/          # Active transcoding  
├── download/                       # Source file staging (API downloads)
└── upload/                         # Result file staging (API uploads)

# Alternative configurations:
/dev/shm/tdarr-cache/               # RAM disk (fastest, volatile, limited size)
/mnt/truenas-share/tdarr/tdarr-cache/  # Network cache (mapped nodes only)

# Performance comparison:
# NVMe cache:     3-7GB/s   (unmapped nodes - RECOMMENDED)
# Network cache:  100MB/s   (mapped nodes - legacy)
# RAM cache:      10GB/s+   (limited by available RAM)

Network I/O Pattern

Optimized Workflow:
1. 📥 Download source (once): NAS → Local NVMe
2. ⚡ Transcode: Local NVMe → Local NVMe  
3. 📤 Upload result (once): Local NVMe → NAS

vs Legacy Mapped Workflow:
1. 🐌 Read source: NAS → Node (streaming)
2. 🐌 Write temp: Node → NAS (streaming) 
3. 🐌 Read temp: NAS → Node (streaming)
4. 🐌 Write final: Node → NAS (streaming)

Scaling Patterns

Horizontal Scaling

# Multiple nodes with load balancing
nodes:
  - name: "gpu-node-1"      # RTX 4090 + NVMe
    role: "heavy-transcode"
  - name: "gpu-node-2"      # RTX 3080 + NVMe  
    role: "standard-transcode"
  - name: "cpu-node-1"      # Multi-core + SSD
    role: "audio-processing"

Resource Specialization

# GPU-optimized node
-e hardwareEncoding=true
-e nvencTemporalAQ=1
-e processes_GPU=2

# CPU-optimized node  
-e hardwareEncoding=false
-e processes_CPU=8
-e ffmpegThreads=16

Monitoring and Operations

Health Checks

# Node connectivity
curl -f http://server:8266/api/v2/status || exit 1

# Cache usage monitoring  
df -h /mnt/nvme/tdarr-cache
du -sh /mnt/nvme/tdarr-cache/*

# Performance metrics
podman stats tdarr-node-1

Log Analysis

# Node registration
podman logs tdarr-node-1 | grep "Node connected"

# Transfer speeds
podman logs tdarr-node-1 | grep -E "(Download|Upload).*MB/s"

# Transcode performance
podman logs tdarr-node-1 | grep -E "fps=.*"

Security Considerations

Network Access

  • Server requires incoming connections on ports 8265/8266
  • Nodes require outbound access to server
  • Consider VPN for cross-site deployments

File Permissions

# Ensure consistent UID/GID across nodes
-e PUID=1000
-e PGID=1000

# Cache directory permissions
chown -R 1000:1000 /mnt/nvme/tdarr-cache
chmod 755 /mnt/nvme/tdarr-cache

Production Enhancements

Gaming-Aware Scheduler

For GPU nodes that serve dual purposes (gaming + transcoding):

# Automated scheduler with gaming detection
scripts/tdarr/tdarr-schedule-manager.sh install

# Configure time windows (example: night-only transcoding)
scripts/tdarr/tdarr-schedule-manager.sh preset night-only  # 10PM-7AM only

Features:

  • Automatic GPU conflict prevention: Detects Steam, gaming processes, GPU >15% usage
  • Configurable time windows: "22-07:daily" (10PM-7AM), "09-17:1-5" (work hours)
  • Real-time monitoring: 1-minute cron checks with instant gaming response
  • Automated cleanup: Removes abandoned temp directories every 6 hours
  • Zero-intervention operation: Stops/starts Tdarr automatically based on rules

Benefits:

  • Gaming priority: Never interferes with gaming sessions
  • Resource optimization: Maximizes transcoding during off-hours
  • System stability: Prevents GPU contention and system slowdowns
  • Maintenance-free: Handles cleanup and scheduling without user intervention

Enhanced Monitoring System

Script: scripts/monitoring/tdarr-timeout-monitor.sh

  • Staging timeout detection: Monitors for download failures and cleanup issues
  • Discord notifications: Professional alerts with user pings for critical issues
  • Automatic recovery: Cleans up stuck work directories and partial downloads
  • Log management: Timestamped logs with automatic rotation
  • Troubleshooting: reference/docker/tdarr-troubleshooting.md
  • Gaming Scheduler: scripts/tdarr/README.md
  • Automation Scripts: scripts/tdarr/ (production-ready node management)
  • Performance: reference/docker/nvidia-troubleshooting.md