claude-home/tdarr/CONTEXT.md
Cal Corum b8b4b13130 CLAUDE: Update Tdarr context for ubuntu-manticore deployment
Rewrote documentation to reflect current deployment on ubuntu-manticore
(10.10.0.226) with actual performance metrics and queue status:
- Server specs: Ubuntu 24.04, GTX 1070, Docker Compose
- Storage: NFS media (48TB) + local NVMe cache (1.9TB)
- Performance: ~13 files/hour, 64% compression, HEVC output
- Queue: 7,675 pending, 37,406 total jobs processed
- Added operational commands, API access, GPU sharing notes
- Moved gaming-aware scheduler to legacy section (not needed on dedicated server)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 01:17:27 -06:00

6.4 KiB

Tdarr Transcoding System - Technology Context

Overview

Tdarr is a distributed transcoding system that converts media files to optimized formats. The current deployment runs on a dedicated Ubuntu server with GPU transcoding and NFS-based media storage.

Current Deployment

Server: ubuntu-manticore (10.10.0.226)

  • OS: Ubuntu 24.04.3 LTS (Noble Numbat)
  • GPU: NVIDIA GeForce GTX 1070 (8GB VRAM)
  • Driver: 570.195.03
  • Container Runtime: Docker with Compose
  • Web UI: http://10.10.0.226:8265

Storage Architecture

Mount Source Purpose
/mnt/truenas/media NFS from 10.10.0.35 Media library (48TB total, ~29TB used)
/mnt/NV2/tdarr-cache Local NVMe Transcode work directory (1.9TB, ~40% used)

Container Configuration

Location: /home/cal/docker/tdarr/docker-compose.yml

version: "3.8"
services:
  tdarr:
    image: ghcr.io/haveagitgat/tdarr:latest
    container_name: tdarr-server
    restart: unless-stopped
    ports:
      - "8265:8265"  # Web UI
      - "8266:8266"  # Server port (for nodes)
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Chicago
      - serverIP=0.0.0.0
      - serverPort=8266
      - webUIPort=8265
    volumes:
      - ./server-data:/app/server
      - ./configs:/app/configs
      - ./logs:/app/logs
      - /mnt/truenas/media:/media

  tdarr-node:
    image: ghcr.io/haveagitgat/tdarr_node:latest
    container_name: tdarr-node
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Chicago
      - serverIP=tdarr
      - serverPort=8266
      - nodeName=manticore-gpu
    volumes:
      - ./node-data:/app/configs
      - /mnt/truenas/media:/media
      - /mnt/NV2/tdarr-cache:/temp
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    depends_on:
      - tdarr

Node Configuration

  • Node Name: manticore-gpu
  • Node Type: Mapped (both server and node access same NFS mount)
  • Workers: 1 GPU transcode worker, 4 GPU healthcheck workers
  • Schedule: Disabled (runs 24/7)

Current Queue Status (Dec 2025)

Metric Value
Transcode Queue ~7,675 files
Success/Not Required 8,378 files
Healthy Files 16,628 files
Job History 37,406 total jobs

Performance Metrics

  • Throughput: ~13 files/hour (varies by file size)
  • Average Compression: ~64% of original size (35% space savings)
  • Codec: HEVC (h265) output at 1080p
  • Typical File Sizes: 3-7 GB input → 2-4.5 GB output

Architecture Patterns

Mapped Node with Shared Storage

Pattern: Server and node share the same media mount via NFS

  • Advantage: Simpler configuration, no file transfer overhead
  • Trade-off: Depends on stable NFS connection during transcoding

When to Use:

  • Dedicated transcoding server (not a gaming/desktop system)
  • Reliable network storage infrastructure
  • Single-node deployments

Local NVMe Cache

Work directory on local NVMe (/mnt/NV2/tdarr-cache:/temp) provides:

  • Fast read/write for transcode operations
  • Isolation from network latency during processing
  • Sufficient space for large remux files (1TB+ available)

Operational Notes

Recent Activity

System is actively processing with strong throughput. Recent successful transcodes include:

  • Dead Like Me (2003) - multiple episodes
  • Supernatural (2005) - S03 episodes
  • I Dream of Jeannie (1965) - S01 episodes
  • Da Vinci's Demons (2013) - S01 episodes

Minor Issues

  • Occasional File Not Found (400): Files deleted/moved while queued fail after 5 retries
    • Impact: Minimal - system continues processing remaining queue
    • Resolution: Automatic - failed files are skipped

Monitoring

  • Server Logs: /home/cal/docker/tdarr/logs/Tdarr_Server_Log.txt
  • Docker Logs: docker logs tdarr-server / docker logs tdarr-node
  • Library Scans: Automatic hourly scans (2 libraries: ZWgKkmzJp, EjfWXCdU8)

Common Operations

Check Status:

ssh 10.10.0.226 "docker ps --format 'table {{.Names}}\t{{.Status}}' | grep tdarr"

View Recent Logs:

ssh 10.10.0.226 "docker logs tdarr-node --since 1h 2>&1 | tail -50"

Restart Services:

ssh 10.10.0.226 "cd /home/cal/docker/tdarr && docker compose restart"

Check GPU Usage:

ssh 10.10.0.226 "nvidia-smi"

API Access

Base URL: http://10.10.0.226:8265/api/v2/

Get Node Status:

curl -s "http://10.10.0.226:8265/api/v2/get-nodes" | jq '.'

GPU Resource Sharing

This server also runs Jellyfin with GPU transcoding. Coordinate usage:

  • Tdarr uses NVENC for encoding
  • Jellyfin uses NVDEC for decoding
  • Both can run simultaneously for different workloads
  • Monitor GPU memory if running concurrent heavy transcodes

Legacy: Gaming-Aware Architecture

The previous deployment on the local desktop used an unmapped node architecture with gaming detection. This is preserved for reference but not currently in use:

Unmapped Node Pattern (Historical)

For gaming desktops requiring GPU priority management:

  • Node downloads files to local cache before processing
  • Gaming detection pauses transcoding automatically
  • Scheduler script manages time windows

When to Consider:

  • Transcoding on a gaming/desktop system
  • Need GPU priority for interactive applications
  • Multiple nodes across network

Best Practices

For Current Deployment

  1. Monitor NFS stability - Tdarr depends on reliable media access
  2. Check cache disk space periodically (df -h /mnt/NV2)
  3. Review queue for stale files after media library changes
  4. GPU memory: Leave headroom for Jellyfin concurrent usage

Error Prevention

  1. Plugin Updates: Automatic hourly plugin sync from server
  2. Retry Logic: 5 attempts with exponential backoff for file operations
  3. Container Health: restart: unless-stopped ensures recovery

Troubleshooting Patterns

  1. File Not Found: Source was deleted - clear from queue via UI
  2. Slow Transcodes: Check NFS latency, GPU utilization
  3. Node Disconnected: Restart node container, check server connectivity

Space Savings Estimate

With ~7,675 files in queue averaging 35% reduction:

  • If average input is 5 GB → saves ~1.75 GB per file
  • Potential savings: ~13 TB when queue completes

This technology context reflects the ubuntu-manticore deployment as of December 2025.