claude-home/docker/examples/distributed-transcoding.md

# Tdarr Distributed Transcoding Pattern

## Overview
Tdarr distributed transcoding with unmapped nodes provides optimal performance for enterprise-scale video processing across multiple machines.

## Architecture Pattern

### Unmapped Node Deployment (Recommended)
```
┌─────────────────┐    ┌──────────────────────────────────┐
│   Tdarr Server  │    │        Unmapped Nodes            │
│                 │    │  ┌─────────┐ ┌─────────┐         │
│  - Web Interface│◄──►│  │ Node 1  │ │ Node 2  │ ...     │
│  - Job Queue    │    │  │ GPU+CPU │ │ GPU+CPU │         │
│  - File Mgmt    │    │  │NVMe Cache│ │NVMe Cache│         │
│                 │    │  └─────────┘ └─────────┘         │
└─────────────────┘    └──────────────────────────────────┘
         │                              │
         └──────── Shared Storage ──────┘
              (NAS/SAN for media files)
```

### Key Components
- **Server**: Centralizes job management and web interface
- **Unmapped Nodes**: Independent transcoding with local cache
- **Shared Storage**: Source and final file repository

## Configuration Templates

### Server Configuration (Optimized)
```yaml
# docker-compose.yml - Hybrid Storage Strategy
version: "3.4"
services:
  tdarr:
    container_name: tdarr
    image: ghcr.io/haveagitgat/tdarr:latest
    restart: unless-stopped
    network_mode: bridge
    ports:
      - 8265:8265 # webUI port
      - 8266:8266 # server port
    environment:
      - TZ=America/Chicago
      - PUID=0
      - PGID=0
      - UMASK_SET=002
      - serverIP=0.0.0.0
      - serverPort=8266
      - webUIPort=8265
      - internalNode=false  # Disable for distributed setup
      - inContainer=true
      - ffmpegVersion=6
      - nodeName=docker-server
    volumes:
      # Hybrid storage strategy - Local for performance, Network for persistence
      - ./tdarr/server:/app/server           # Local: Database, configs, logs
      - ./tdarr/configs:/app/configs         # Local: Fast config access
      - ./tdarr/logs:/app/logs               # Local: Logging performance
      - /mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups  # Network: Backups only

      # Media and cache (when using mapped nodes)
      - /mnt/truenas-share:/media            # Network: Source media
      - /mnt/truenas-share/tdarr/tdarr-cache:/temp  # Network: Shared cache (mapped nodes only)
```

### Unmapped Node Configuration (Production)
```bash
#!/bin/bash
# Tdarr Unmapped Node with GPU Support - NVMe Cache Optimization
# Production script: scripts/tdarr/start-tdarr-gpu-podman-clean.sh

CONTAINER_NAME="tdarr-node-gpu-unmapped"
SERVER_IP="10.10.0.43"
SERVER_PORT="8266"
NODE_NAME="nobara-pc-gpu-unmapped"

# Clean container management
if podman ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
    podman stop "${CONTAINER_NAME}" 2>/dev/null || true
    podman rm "${CONTAINER_NAME}" 2>/dev/null || true
fi

# Production unmapped node with optimized cache
podman run -d --name "${CONTAINER_NAME}" \
    --gpus all \
    --restart unless-stopped \
    -e TZ=America/Chicago \
    -e UMASK_SET=002 \
    -e nodeName="${NODE_NAME}" \
    -e serverIP="${SERVER_IP}" \
    -e serverPort="${SERVER_PORT}" \
    -e nodeType=unmapped \
    -e inContainer=true \
    -e ffmpegVersion=6 \
    -e logLevel=DEBUG \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -v "/mnt/NV2/tdarr-cache:/cache" \                    # NVMe cache (3-7GB/s)
    -v "/mnt/media:/app/unmappedNodeCache/nobara-pc-gpu-unmapped/media" \
    ghcr.io/haveagitgat/tdarr_node:latest
```

## File Transfer Optimizations

### Hybrid Storage Strategy (Server)
The server uses a hybrid approach balancing performance and reliability:

```bash
# Local storage (SSD/NVMe) - High Performance Operations
./tdarr/server:/app/server           # Database - frequent read/write
./tdarr/configs:/app/configs         # Config files - startup performance
./tdarr/logs:/app/logs               # Log files - continuous writing

# Network storage (NAS) - Persistence & Backup
/mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups  # Infrequent access
```

**Benefits:**
- **Database performance**: Local SQLite operations (100x faster than network)
- **Log performance**: Eliminates network I/O bottleneck for continuous logging
- **Reliability**: Critical backups stored on redundant NAS storage
- **Config speed**: Fast server startup with local configuration files

### Container Platform Migration: Docker → Podman

**Advantages of Podman for Tdarr:**
```bash
# Enhanced GPU support
--gpus all                           # Improved NVIDIA integration
-e NVIDIA_DRIVER_CAPABILITIES=all    # Full GPU access
-e NVIDIA_VISIBLE_DEVICES=all        # All GPU visibility

# Better resource management
--restart unless-stopped             # Smarter restart policies
# Rootless containers (when needed)  # Enhanced security
```

**Migration Benefits:**
- **GPU reliability**: Better NVIDIA container integration
- **Resource isolation**: Improved container resource management
- **System integration**: Better integration with systemd and cgroups
- **Performance**: Reduced overhead compared to Docker daemon

## Performance Optimization

### Cache Storage Strategy (Updated)
```bash
# Production cache storage hierarchy (NVMe optimized)
/mnt/NV2/tdarr-cache/               # NVMe SSD (3-7GB/s) - PRODUCTION
├── tdarr-workDir-{jobId}/          # Active transcoding
├── download/                       # Source file staging (API downloads)
└── upload/                         # Result file staging (API uploads)

# Alternative configurations:
/dev/shm/tdarr-cache/               # RAM disk (fastest, volatile, limited size)
/mnt/truenas-share/tdarr/tdarr-cache/  # Network cache (mapped nodes only)

# Performance comparison:
# NVMe cache:     3-7GB/s   (unmapped nodes - RECOMMENDED)
# Network cache:  100MB/s   (mapped nodes - legacy)
# RAM cache:      10GB/s+   (limited by available RAM)
```

### Network I/O Pattern
```
Optimized Workflow:
1. 📥 Download source (once): NAS → Local NVMe
2. ⚡ Transcode: Local NVMe → Local NVMe
3. 📤 Upload result (once): Local NVMe → NAS

vs Legacy Mapped Workflow:
1. 🐌 Read source: NAS → Node (streaming)
2. 🐌 Write temp: Node → NAS (streaming)
3. 🐌 Read temp: NAS → Node (streaming)
4. 🐌 Write final: Node → NAS (streaming)
```

## Scaling Patterns

### Horizontal Scaling
```yaml
# Multiple nodes with load balancing
nodes:
  - name: "gpu-node-1"      # RTX 4090 + NVMe
    role: "heavy-transcode"
  - name: "gpu-node-2"      # RTX 3080 + NVMe
    role: "standard-transcode"
  - name: "cpu-node-1"      # Multi-core + SSD
    role: "audio-processing"
```

### Resource Specialization
```bash
# GPU-optimized node
-e hardwareEncoding=true
-e nvencTemporalAQ=1
-e processes_GPU=2

# CPU-optimized node
-e hardwareEncoding=false
-e processes_CPU=8
-e ffmpegThreads=16
```

## Monitoring and Operations

### Health Checks
```bash
# Node connectivity
curl -f http://server:8266/api/v2/status || exit 1

# Cache usage monitoring
df -h /mnt/nvme/tdarr-cache
du -sh /mnt/nvme/tdarr-cache/*

# Performance metrics
podman stats tdarr-node-1
```

### Log Analysis
```bash
# Node registration
podman logs tdarr-node-1 | grep "Node connected"

# Transfer speeds
podman logs tdarr-node-1 | grep -E "(Download|Upload).*MB/s"

# Transcode performance
podman logs tdarr-node-1 | grep -E "fps=.*"
```

## Security Considerations

### Network Access
- Server requires incoming connections on ports 8265/8266
- Nodes require outbound access to server
- Consider VPN for cross-site deployments

### File Permissions
```bash
# Ensure consistent UID/GID across nodes
-e PUID=1000
-e PGID=1000

# Cache directory permissions
chown -R 1000:1000 /mnt/nvme/tdarr-cache
chmod 755 /mnt/nvme/tdarr-cache
```

## Production Enhancements

### Gaming-Aware Scheduler
For GPU nodes that serve dual purposes (gaming + transcoding):

```bash
# Automated scheduler with gaming detection
scripts/tdarr/tdarr-schedule-manager.sh install

# Configure time windows (example: night-only transcoding)
scripts/tdarr/tdarr-schedule-manager.sh preset night-only  # 10PM-7AM only
```

**Features:**
- **Automatic GPU conflict prevention**: Detects Steam, gaming processes, GPU >15% usage
- **Configurable time windows**: `"22-07:daily"` (10PM-7AM), `"09-17:1-5"` (work hours)
- **Real-time monitoring**: 1-minute cron checks with instant gaming response
- **Automated cleanup**: Removes abandoned temp directories every 6 hours
- **Zero-intervention operation**: Stops/starts Tdarr automatically based on rules

**Benefits:**
- **Gaming priority**: Never interferes with gaming sessions
- **Resource optimization**: Maximizes transcoding during off-hours
- **System stability**: Prevents GPU contention and system slowdowns
- **Maintenance-free**: Handles cleanup and scheduling without user intervention

### Enhanced Monitoring System
**Script**: `scripts/monitoring/tdarr-timeout-monitor.sh`

- **Staging timeout detection**: Monitors for download failures and cleanup issues
- **Discord notifications**: Professional alerts with user pings for critical issues
- **Automatic recovery**: Cleans up stuck work directories and partial downloads
- **Log management**: Timestamped logs with automatic rotation

## Related References
- **Troubleshooting**: `reference/docker/tdarr-troubleshooting.md`
- **Gaming Scheduler**: `scripts/tdarr/README.md`
- **Automation Scripts**: `scripts/tdarr/` (production-ready node management)
- **Performance**: `reference/docker/nvidia-troubleshooting.md`