claude-home/patterns/docker/distributed-transcoding.md
Cal Corum df3d22b218 CLAUDE: Expand documentation system and organize operational scripts
- Add comprehensive Tdarr troubleshooting and GPU transcoding documentation
- Create /scripts directory for active operational scripts
- Archive mapped node example in /examples for reference
- Update CLAUDE.md with scripts directory context triggers
- Add distributed transcoding patterns and NVIDIA troubleshooting guides
- Enhance documentation structure with clear directory usage guidelines

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 15:53:09 -05:00

5.1 KiB

Tdarr Distributed Transcoding Pattern

Overview

Tdarr distributed transcoding with unmapped nodes provides optimal performance for enterprise-scale video processing across multiple machines.

Architecture Pattern

┌─────────────────┐    ┌──────────────────────────────────┐
│   Tdarr Server  │    │        Unmapped Nodes            │
│                 │    │  ┌─────────┐ ┌─────────┐         │
│  - Web Interface│◄──►│  │ Node 1  │ │ Node 2  │ ...     │
│  - Job Queue    │    │  │ GPU+CPU │ │ GPU+CPU │         │
│  - File Mgmt    │    │  │NVMe Cache│ │NVMe Cache│         │
│                 │    │  └─────────┘ └─────────┘         │
└─────────────────┘    └──────────────────────────────────┘
         │                              │
         └──────── Shared Storage ──────┘
              (NAS/SAN for media files)

Key Components

  • Server: Centralizes job management and web interface
  • Unmapped Nodes: Independent transcoding with local cache
  • Shared Storage: Source and final file repository

Configuration Templates

Server Configuration

# docker-compose.yml
version: "3.4"
services:
  tdarr-server:
    image: ghcr.io/haveagitgat/tdarr:latest
    ports:
      - "8265:8265"  # Web UI
      - "8266:8266"  # Server API
    environment:
      - TZ=America/Chicago
      - serverIP=0.0.0.0
      - serverPort=8266
      - webUIPort=8265
    volumes:
      - ./server:/app/server
      - ./configs:/app/configs  
      - ./logs:/app/logs
      - /path/to/media:/media
    # Note: No temp/cache volume needed for server with unmapped nodes

Unmapped Node Configuration

#!/bin/bash
# Optimal unmapped node with local NVMe cache

podman run -d --name "tdarr-node-1" \
    --gpus all \
    -e TZ=America/Chicago \
    -e nodeName="transcoding-node-1" \
    -e serverIP="10.10.0.43" \
    -e serverPort="8266" \
    -e nodeType=unmapped \
    -e inContainer=true \
    -e ffmpegVersion=6 \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -v "/mnt/nvme/tdarr-cache:/cache" \
    ghcr.io/haveagitgat/tdarr_node:latest

Performance Optimization

Cache Storage Strategy

# Optimal cache storage hierarchy
/mnt/nvme/tdarr-cache/              # NVMe SSD (fastest)
├── tdarr-workDir-{jobId}/          # Active transcoding  
├── download/                       # Source file staging
└── upload/                         # Result file staging

# Alternative: RAM disk for ultra-performance (limited size)
/dev/shm/tdarr-cache/               # RAM disk (fastest, volatile)

# Avoid: Network mounted cache (slowest)
/mnt/nas/tdarr-cache/               # Network storage (not recommended)

Network I/O Pattern

Optimized Workflow:
1. 📥 Download source (once): NAS → Local NVMe
2. ⚡ Transcode: Local NVMe → Local NVMe  
3. 📤 Upload result (once): Local NVMe → NAS

vs Legacy Mapped Workflow:
1. 🐌 Read source: NAS → Node (streaming)
2. 🐌 Write temp: Node → NAS (streaming) 
3. 🐌 Read temp: NAS → Node (streaming)
4. 🐌 Write final: Node → NAS (streaming)

Scaling Patterns

Horizontal Scaling

# Multiple nodes with load balancing
nodes:
  - name: "gpu-node-1"      # RTX 4090 + NVMe
    role: "heavy-transcode"
  - name: "gpu-node-2"      # RTX 3080 + NVMe  
    role: "standard-transcode"
  - name: "cpu-node-1"      # Multi-core + SSD
    role: "audio-processing"

Resource Specialization

# GPU-optimized node
-e hardwareEncoding=true
-e nvencTemporalAQ=1
-e processes_GPU=2

# CPU-optimized node  
-e hardwareEncoding=false
-e processes_CPU=8
-e ffmpegThreads=16

Monitoring and Operations

Health Checks

# Node connectivity
curl -f http://server:8266/api/v2/status || exit 1

# Cache usage monitoring  
df -h /mnt/nvme/tdarr-cache
du -sh /mnt/nvme/tdarr-cache/*

# Performance metrics
podman stats tdarr-node-1

Log Analysis

# Node registration
podman logs tdarr-node-1 | grep "Node connected"

# Transfer speeds
podman logs tdarr-node-1 | grep -E "(Download|Upload).*MB/s"

# Transcode performance
podman logs tdarr-node-1 | grep -E "fps=.*"

Security Considerations

Network Access

  • Server requires incoming connections on ports 8265/8266
  • Nodes require outbound access to server
  • Consider VPN for cross-site deployments

File Permissions

# Ensure consistent UID/GID across nodes
-e PUID=1000
-e PGID=1000

# Cache directory permissions
chown -R 1000:1000 /mnt/nvme/tdarr-cache
chmod 755 /mnt/nvme/tdarr-cache
  • Troubleshooting: reference/docker/tdarr-troubleshooting.md
  • Examples: examples/docker/tdarr-node-local/
  • Performance: reference/docker/nvidia-troubleshooting.md