claude-home/tdarr/CONTEXT.md

# Tdarr Transcoding System - Technology Context

## Overview
Tdarr is a distributed transcoding system that converts media files to optimized formats. The current deployment runs on a dedicated Ubuntu server with GPU transcoding and NFS-based media storage.

## Current Deployment

### Server: ubuntu-manticore (10.10.0.226)
- **OS**: Ubuntu 24.04.3 LTS (Noble Numbat)
- **GPU**: NVIDIA GeForce GTX 1070 (8GB VRAM)
- **Driver**: 570.195.03
- **Container Runtime**: Docker with Compose
- **Web UI**: http://10.10.0.226:8265

### Storage Architecture
| Mount | Source | Purpose |
|-------|--------|---------|
| `/mnt/truenas/media` | NFS from 10.10.0.35 | Media library (48TB total, ~29TB used) |
| `/mnt/NV2/tdarr-cache` | Local NVMe | Transcode work directory (1.9TB, ~40% used) |

### Container Configuration
**Location**: `/home/cal/docker/tdarr/docker-compose.yml`

```yaml
version: "3.8"
services:
  tdarr:
    image: ghcr.io/haveagitgat/tdarr:latest
    container_name: tdarr-server
    restart: unless-stopped
    ports:
      - "8265:8265"  # Web UI
      - "8266:8266"  # Server port (for nodes)
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Chicago
      - serverIP=0.0.0.0
      - serverPort=8266
      - webUIPort=8265
    volumes:
      - ./server-data:/app/server
      - ./configs:/app/configs
      - ./logs:/app/logs
      - /mnt/truenas/media:/media

  tdarr-node:
    image: ghcr.io/haveagitgat/tdarr_node:latest
    container_name: tdarr-node
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Chicago
      - serverIP=tdarr
      - serverPort=8266
      - nodeName=manticore-gpu
    volumes:
      - ./node-data:/app/configs
      - /mnt/truenas/media:/media
      - /mnt/NV2/tdarr-cache:/temp
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    depends_on:
      - tdarr
```

### Node Configuration
- **Node Name**: manticore-gpu
- **Node Type**: Mapped (both server and node access same NFS mount)
- **Workers**: 1 GPU transcode worker, 4 GPU healthcheck workers
- **Schedule**: Disabled (runs 24/7)

### Current Queue Status (Dec 2025)
| Metric | Value |
|--------|-------|
| Transcode Queue | ~7,675 files |
| Success/Not Required | 8,378 files |
| Healthy Files | 16,628 files |
| Job History | 37,406 total jobs |

### Performance Metrics
- **Throughput**: ~13 files/hour (varies by file size)
- **Average Compression**: ~64% of original size (35% space savings)
- **Codec**: HEVC (h265) output at 1080p
- **Typical File Sizes**: 3-7 GB input → 2-4.5 GB output

## Architecture Patterns

### Mapped Node with Shared Storage
**Pattern**: Server and node share the same media mount via NFS
- **Advantage**: Simpler configuration, no file transfer overhead
- **Trade-off**: Depends on stable NFS connection during transcoding

**When to Use**:
- Dedicated transcoding server (not a gaming/desktop system)
- Reliable network storage infrastructure
- Single-node deployments

### Local NVMe Cache
Work directory on local NVMe (`/mnt/NV2/tdarr-cache:/temp`) provides:
- Fast read/write for transcode operations
- Isolation from network latency during processing
- Sufficient space for large remux files (1TB+ available)

## Operational Notes

### Recent Activity
System is actively processing with strong throughput. Recent successful transcodes include:
- Dead Like Me (2003) - multiple episodes
- Supernatural (2005) - S03 episodes
- I Dream of Jeannie (1965) - S01 episodes
- Da Vinci's Demons (2013) - S01 episodes

### Minor Issues
- **Occasional File Not Found (400)**: Files deleted/moved while queued fail after 5 retries
  - Impact: Minimal - system continues processing remaining queue
  - Resolution: Automatic - failed files are skipped

### Monitoring
- **Server Logs**: `/home/cal/docker/tdarr/logs/Tdarr_Server_Log.txt`
- **Docker Logs**: `docker logs tdarr-server` / `docker logs tdarr-node`
- **Library Scans**: Automatic hourly scans (2 libraries: ZWgKkmzJp, EjfWXCdU8)

### Common Operations

**Check Status**:
```bash
ssh 10.10.0.226 "docker ps --format 'table {{.Names}}\t{{.Status}}' | grep tdarr"
```

**View Recent Logs**:
```bash
ssh 10.10.0.226 "docker logs tdarr-node --since 1h 2>&1 | tail -50"
```

**Restart Services**:
```bash
ssh 10.10.0.226 "cd /home/cal/docker/tdarr && docker compose restart"
```

**Check GPU Usage**:
```bash
ssh 10.10.0.226 "nvidia-smi"
```

### API Access
Base URL: `http://10.10.0.226:8265/api/v2/`

**Get Node Status**:
```bash
curl -s "http://10.10.0.226:8265/api/v2/get-nodes" | jq '.'
```

## GPU Resource Sharing
This server also runs Jellyfin with GPU transcoding. Coordinate usage:
- Tdarr uses NVENC for encoding
- Jellyfin uses NVDEC for decoding
- Both can run simultaneously for different workloads
- Monitor GPU memory if running concurrent heavy transcodes

## Legacy: Gaming-Aware Architecture
The previous deployment on the local desktop used an unmapped node architecture with gaming detection. This is preserved for reference but not currently in use:

### Unmapped Node Pattern (Historical)
For gaming desktops requiring GPU priority management:
- Node downloads files to local cache before processing
- Gaming detection pauses transcoding automatically
- Scheduler script manages time windows

**When to Consider**:
- Transcoding on a gaming/desktop system
- Need GPU priority for interactive applications
- Multiple nodes across network

## Best Practices

### For Current Deployment
1. Monitor NFS stability - Tdarr depends on reliable media access
2. Check cache disk space periodically (`df -h /mnt/NV2`)
3. Review queue for stale files after media library changes
4. GPU memory: Leave headroom for Jellyfin concurrent usage

### Error Prevention
1. **Plugin Updates**: Automatic hourly plugin sync from server
2. **Retry Logic**: 5 attempts with exponential backoff for file operations
3. **Container Health**: `restart: unless-stopped` ensures recovery

### Troubleshooting Patterns
1. **File Not Found**: Source was deleted - clear from queue via UI
2. **Slow Transcodes**: Check NFS latency, GPU utilization
3. **Node Disconnected**: Restart node container, check server connectivity

## Space Savings Estimate
With ~7,675 files in queue averaging 35% reduction:
- If average input is 5 GB → saves ~1.75 GB per file
- Potential savings: ~13 TB when queue completes

This technology context reflects the ubuntu-manticore deployment as of December 2025.