claude-home/docker/examples/distributed-transcoding.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

289 lines
9.8 KiB
Markdown

# Tdarr Distributed Transcoding Pattern
## Overview
Tdarr distributed transcoding with unmapped nodes provides optimal performance for enterprise-scale video processing across multiple machines.
## Architecture Pattern
### Unmapped Node Deployment (Recommended)
```
┌─────────────────┐ ┌──────────────────────────────────┐
│ Tdarr Server │ │ Unmapped Nodes │
│ │ │ ┌─────────┐ ┌─────────┐ │
│ - Web Interface│◄──►│ │ Node 1 │ │ Node 2 │ ... │
│ - Job Queue │ │ │ GPU+CPU │ │ GPU+CPU │ │
│ - File Mgmt │ │ │NVMe Cache│ │NVMe Cache│ │
│ │ │ └─────────┘ └─────────┘ │
└─────────────────┘ └──────────────────────────────────┘
│ │
└──────── Shared Storage ──────┘
(NAS/SAN for media files)
```
### Key Components
- **Server**: Centralizes job management and web interface
- **Unmapped Nodes**: Independent transcoding with local cache
- **Shared Storage**: Source and final file repository
## Configuration Templates
### Server Configuration (Optimized)
```yaml
# docker-compose.yml - Hybrid Storage Strategy
version: "3.4"
services:
tdarr:
container_name: tdarr
image: ghcr.io/haveagitgat/tdarr:latest
restart: unless-stopped
network_mode: bridge
ports:
- 8265:8265 # webUI port
- 8266:8266 # server port
environment:
- TZ=America/Chicago
- PUID=0
- PGID=0
- UMASK_SET=002
- serverIP=0.0.0.0
- serverPort=8266
- webUIPort=8265
- internalNode=false # Disable for distributed setup
- inContainer=true
- ffmpegVersion=6
- nodeName=docker-server
volumes:
# Hybrid storage strategy - Local for performance, Network for persistence
- ./tdarr/server:/app/server # Local: Database, configs, logs
- ./tdarr/configs:/app/configs # Local: Fast config access
- ./tdarr/logs:/app/logs # Local: Logging performance
- /mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups # Network: Backups only
# Media and cache (when using mapped nodes)
- /mnt/truenas-share:/media # Network: Source media
- /mnt/truenas-share/tdarr/tdarr-cache:/temp # Network: Shared cache (mapped nodes only)
```
### Unmapped Node Configuration (Production)
```bash
#!/bin/bash
# Tdarr Unmapped Node with GPU Support - NVMe Cache Optimization
# Production script: scripts/tdarr/start-tdarr-gpu-podman-clean.sh
CONTAINER_NAME="tdarr-node-gpu-unmapped"
SERVER_IP="10.10.0.43"
SERVER_PORT="8266"
NODE_NAME="nobara-pc-gpu-unmapped"
# Clean container management
if podman ps -a --format "{{.Names}}" | grep -q "^${CONTAINER_NAME}$"; then
podman stop "${CONTAINER_NAME}" 2>/dev/null || true
podman rm "${CONTAINER_NAME}" 2>/dev/null || true
fi
# Production unmapped node with optimized cache
podman run -d --name "${CONTAINER_NAME}" \
--gpus all \
--restart unless-stopped \
-e TZ=America/Chicago \
-e UMASK_SET=002 \
-e nodeName="${NODE_NAME}" \
-e serverIP="${SERVER_IP}" \
-e serverPort="${SERVER_PORT}" \
-e nodeType=unmapped \
-e inContainer=true \
-e ffmpegVersion=6 \
-e logLevel=DEBUG \
-e NVIDIA_DRIVER_CAPABILITIES=all \
-e NVIDIA_VISIBLE_DEVICES=all \
-v "/mnt/NV2/tdarr-cache:/cache" \ # NVMe cache (3-7GB/s)
-v "/mnt/media:/app/unmappedNodeCache/nobara-pc-gpu-unmapped/media" \
ghcr.io/haveagitgat/tdarr_node:latest
```
## File Transfer Optimizations
### Hybrid Storage Strategy (Server)
The server uses a hybrid approach balancing performance and reliability:
```bash
# Local storage (SSD/NVMe) - High Performance Operations
./tdarr/server:/app/server # Database - frequent read/write
./tdarr/configs:/app/configs # Config files - startup performance
./tdarr/logs:/app/logs # Log files - continuous writing
# Network storage (NAS) - Persistence & Backup
/mnt/truenas-share/tdarr/tdarr-server/Backups:/app/server/Tdarr/Backups # Infrequent access
```
**Benefits:**
- **Database performance**: Local SQLite operations (100x faster than network)
- **Log performance**: Eliminates network I/O bottleneck for continuous logging
- **Reliability**: Critical backups stored on redundant NAS storage
- **Config speed**: Fast server startup with local configuration files
### Container Platform Migration: Docker → Podman
**Advantages of Podman for Tdarr:**
```bash
# Enhanced GPU support
--gpus all # Improved NVIDIA integration
-e NVIDIA_DRIVER_CAPABILITIES=all # Full GPU access
-e NVIDIA_VISIBLE_DEVICES=all # All GPU visibility
# Better resource management
--restart unless-stopped # Smarter restart policies
# Rootless containers (when needed) # Enhanced security
```
**Migration Benefits:**
- **GPU reliability**: Better NVIDIA container integration
- **Resource isolation**: Improved container resource management
- **System integration**: Better integration with systemd and cgroups
- **Performance**: Reduced overhead compared to Docker daemon
## Performance Optimization
### Cache Storage Strategy (Updated)
```bash
# Production cache storage hierarchy (NVMe optimized)
/mnt/NV2/tdarr-cache/ # NVMe SSD (3-7GB/s) - PRODUCTION
├── tdarr-workDir-{jobId}/ # Active transcoding
├── download/ # Source file staging (API downloads)
└── upload/ # Result file staging (API uploads)
# Alternative configurations:
/dev/shm/tdarr-cache/ # RAM disk (fastest, volatile, limited size)
/mnt/truenas-share/tdarr/tdarr-cache/ # Network cache (mapped nodes only)
# Performance comparison:
# NVMe cache: 3-7GB/s (unmapped nodes - RECOMMENDED)
# Network cache: 100MB/s (mapped nodes - legacy)
# RAM cache: 10GB/s+ (limited by available RAM)
```
### Network I/O Pattern
```
Optimized Workflow:
1. 📥 Download source (once): NAS → Local NVMe
2. ⚡ Transcode: Local NVMe → Local NVMe
3. 📤 Upload result (once): Local NVMe → NAS
vs Legacy Mapped Workflow:
1. 🐌 Read source: NAS → Node (streaming)
2. 🐌 Write temp: Node → NAS (streaming)
3. 🐌 Read temp: NAS → Node (streaming)
4. 🐌 Write final: Node → NAS (streaming)
```
## Scaling Patterns
### Horizontal Scaling
```yaml
# Multiple nodes with load balancing
nodes:
- name: "gpu-node-1" # RTX 4090 + NVMe
role: "heavy-transcode"
- name: "gpu-node-2" # RTX 3080 + NVMe
role: "standard-transcode"
- name: "cpu-node-1" # Multi-core + SSD
role: "audio-processing"
```
### Resource Specialization
```bash
# GPU-optimized node
-e hardwareEncoding=true
-e nvencTemporalAQ=1
-e processes_GPU=2
# CPU-optimized node
-e hardwareEncoding=false
-e processes_CPU=8
-e ffmpegThreads=16
```
## Monitoring and Operations
### Health Checks
```bash
# Node connectivity
curl -f http://server:8266/api/v2/status || exit 1
# Cache usage monitoring
df -h /mnt/nvme/tdarr-cache
du -sh /mnt/nvme/tdarr-cache/*
# Performance metrics
podman stats tdarr-node-1
```
### Log Analysis
```bash
# Node registration
podman logs tdarr-node-1 | grep "Node connected"
# Transfer speeds
podman logs tdarr-node-1 | grep -E "(Download|Upload).*MB/s"
# Transcode performance
podman logs tdarr-node-1 | grep -E "fps=.*"
```
## Security Considerations
### Network Access
- Server requires incoming connections on ports 8265/8266
- Nodes require outbound access to server
- Consider VPN for cross-site deployments
### File Permissions
```bash
# Ensure consistent UID/GID across nodes
-e PUID=1000
-e PGID=1000
# Cache directory permissions
chown -R 1000:1000 /mnt/nvme/tdarr-cache
chmod 755 /mnt/nvme/tdarr-cache
```
## Production Enhancements
### Gaming-Aware Scheduler
For GPU nodes that serve dual purposes (gaming + transcoding):
```bash
# Automated scheduler with gaming detection
scripts/tdarr/tdarr-schedule-manager.sh install
# Configure time windows (example: night-only transcoding)
scripts/tdarr/tdarr-schedule-manager.sh preset night-only # 10PM-7AM only
```
**Features:**
- **Automatic GPU conflict prevention**: Detects Steam, gaming processes, GPU >15% usage
- **Configurable time windows**: `"22-07:daily"` (10PM-7AM), `"09-17:1-5"` (work hours)
- **Real-time monitoring**: 1-minute cron checks with instant gaming response
- **Automated cleanup**: Removes abandoned temp directories every 6 hours
- **Zero-intervention operation**: Stops/starts Tdarr automatically based on rules
**Benefits:**
- **Gaming priority**: Never interferes with gaming sessions
- **Resource optimization**: Maximizes transcoding during off-hours
- **System stability**: Prevents GPU contention and system slowdowns
- **Maintenance-free**: Handles cleanup and scheduling without user intervention
### Enhanced Monitoring System
**Script**: `scripts/monitoring/tdarr-timeout-monitor.sh`
- **Staging timeout detection**: Monitors for download failures and cleanup issues
- **Discord notifications**: Professional alerts with user pings for critical issues
- **Automatic recovery**: Cleans up stuck work directories and partial downloads
- **Log management**: Timestamped logs with automatic rotation
## Related References
- **Troubleshooting**: `reference/docker/tdarr-troubleshooting.md`
- **Gaming Scheduler**: `scripts/tdarr/README.md`
- **Automation Scripts**: `scripts/tdarr/` (production-ready node management)
- **Performance**: `reference/docker/nvidia-troubleshooting.md`