CLAUDE: Update Tdarr context for ubuntu-manticore deployment

Rewrote documentation to reflect current deployment on ubuntu-manticore (10.10.0.226) with actual performance metrics and queue status: - Server specs: Ubuntu 24.04, GTX 1070, Docker Compose - Storage: NFS media (48TB) + local NVMe cache (1.9TB) - Performance: ~13 files/hour, 64% compression, HEVC output - Queue: 7,675 pending, 37,406 total jobs processed - Added operational commands, API access, GPU sharing notes - Moved gaming-aware scheduler to legacy section (not needed on dedicated server) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-07 01:17:27 -06:00 · 2025-12-07 01:17:27 -06:00 · b8b4b13130
commit b8b4b13130
parent 117788f216
1 changed files with 172 additions and 120 deletions
--- a/tdarr/CONTEXT.md
+++ b/tdarr/CONTEXT.md
@ -1,152 +1,204 @@
 # Tdarr Transcoding System - Technology Context

 ## Overview
-Tdarr is a distributed transcoding system that converts media files to optimized formats. This implementation uses an intelligent gaming-aware scheduler with unmapped node architecture for optimal performance and system stability.
+Tdarr is a distributed transcoding system that converts media files to optimized formats. The current deployment runs on a dedicated Ubuntu server with GPU transcoding and NFS-based media storage.

-## Architecture Patterns
+## Current Deployment

-### Distributed Unmapped Node Architecture (Recommended)
-**Pattern**: Server-Node separation with local high-speed cache
- **Server**: Tdarr Server manages queue, web interface, and coordination
- **Node**: Unmapped nodes with local NVMe cache for processing
- **Benefits**: 3-5x performance improvement, network I/O reduction, linear scaling
+### Server: ubuntu-manticore (10.10.0.226)
+- **OS**: Ubuntu 24.04.3 LTS (Noble Numbat)
+- **GPU**: NVIDIA GeForce GTX 1070 (8GB VRAM)
+- **Driver**: 570.195.03
+- **Container Runtime**: Docker with Compose
+- **Web UI**: http://10.10.0.226:8265

-**When to Use**:
- Multiple transcoding nodes across network
- High-performance requirements (10GB+ files)
- Network bandwidth limitations
- Gaming systems requiring GPU priority management
+### Storage Architecture
+| Mount | Source | Purpose |
+|-------|--------|---------|
+| `/mnt/truenas/media` | NFS from 10.10.0.35 | Media library (48TB total, ~29TB used) |
+| `/mnt/NV2/tdarr-cache` | Local NVMe | Transcode work directory (1.9TB, ~40% used) |

-### Configuration Principles
-1. **Cache Optimization**: Use local NVMe storage for work directories
-2. **Gaming Detection**: Automatic pause during GPU-intensive activities  
-3. **Resource Isolation**: Container limits prevent kernel-level crashes
-4. **Monitoring Integration**: Automated cleanup and Discord notifications
+### Container Configuration
+**Location**: `/home/cal/docker/tdarr/docker-compose.yml`

-## Core Components
-
-### Gaming-Aware Scheduler
-**Purpose**: Automatically manages Tdarr node to avoid conflicts with gaming
-**Location**: `scripts/tdarr-schedule-manager.sh`
-
-**Key Features**:
- Detects gaming processes (Steam, Lutris, Wine, etc.)
- GPU usage monitoring (>15% threshold)
- Configurable time windows
- Automated temporary directory cleanup
-
-**Schedule Format**: `"HOUR_START-HOUR_END:DAYS"`
- `"22-07:daily"` - Overnight transcoding
- `"09-17:1-5"` - Business hours weekdays only
- `"14-16:6,7"` - Weekend afternoon window
-
-### Monitoring System
-**Purpose**: Prevents staging section timeouts and system instability
-**Location**: `scripts/monitoring/tdarr-timeout-monitor.sh`
-
-**Capabilities**:
- Staging timeout detection (300-second hardcoded limit)
- Automatic work directory cleanup
- Discord notifications with user pings
- Log rotation and retention management
-
-### Container Architecture
-**Server Configuration**:
 ```yaml
-# Hybrid storage with resource limits
+version: "3.8"
 services:
  tdarr:
    image: ghcr.io/haveagitgat/tdarr:latest
-    ports: ["8265:8266"]
+    container_name: tdarr-server
+    restart: unless-stopped
+    ports:
+      - "8265:8265"  # Web UI
+      - "8266:8266"  # Server port (for nodes)
+    environment:
+      - PUID=1000
+      - PGID=1000
+      - TZ=America/Chicago
+      - serverIP=0.0.0.0
+      - serverPort=8266
+      - webUIPort=8265
    volumes:
-      - "./tdarr-data:/app/configs"
-      - "/mnt/media:/media"
+      - ./server-data:/app/server
+      - ./configs:/app/configs
+      - ./logs:/app/logs
+      - /mnt/truenas/media:/media
+
+  tdarr-node:
+    image: ghcr.io/haveagitgat/tdarr_node:latest
+    container_name: tdarr-node
+    restart: unless-stopped
+    environment:
+      - PUID=1000
+      - PGID=1000
+      - TZ=America/Chicago
+      - serverIP=tdarr
+      - serverPort=8266
+      - nodeName=manticore-gpu
+    volumes:
+      - ./node-data:/app/configs
+      - /mnt/truenas/media:/media
+      - /mnt/NV2/tdarr-cache:/temp
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+    depends_on:
+      - tdarr
 ```

-**Node Configuration**:
+### Node Configuration
+- **Node Name**: manticore-gpu
+- **Node Type**: Mapped (both server and node access same NFS mount)
+- **Workers**: 1 GPU transcode worker, 4 GPU healthcheck workers
+- **Schedule**: Disabled (runs 24/7)
+
+### Current Queue Status (Dec 2025)
+| Metric | Value |
+|--------|-------|
+| Transcode Queue | ~7,675 files |
+| Success/Not Required | 8,378 files |
+| Healthy Files | 16,628 files |
+| Job History | 37,406 total jobs |
+
+### Performance Metrics
+- **Throughput**: ~13 files/hour (varies by file size)
+- **Average Compression**: ~64% of original size (35% space savings)
+- **Codec**: HEVC (h265) output at 1080p
+- **Typical File Sizes**: 3-7 GB input → 2-4.5 GB output
+
+## Architecture Patterns
+
+### Mapped Node with Shared Storage
+**Pattern**: Server and node share the same media mount via NFS
+- **Advantage**: Simpler configuration, no file transfer overhead
+- **Trade-off**: Depends on stable NFS connection during transcoding
+
+**When to Use**:
+- Dedicated transcoding server (not a gaming/desktop system)
+- Reliable network storage infrastructure
+- Single-node deployments
+
+### Local NVMe Cache
+Work directory on local NVMe (`/mnt/NV2/tdarr-cache:/temp`) provides:
+- Fast read/write for transcode operations
+- Isolation from network latency during processing
+- Sufficient space for large remux files (1TB+ available)
+
+## Operational Notes
+
+### Recent Activity
+System is actively processing with strong throughput. Recent successful transcodes include:
+- Dead Like Me (2003) - multiple episodes
+- Supernatural (2005) - S03 episodes
+- I Dream of Jeannie (1965) - S01 episodes
+- Da Vinci's Demons (2013) - S01 episodes
+
+### Minor Issues
+- **Occasional File Not Found (400)**: Files deleted/moved while queued fail after 5 retries
+  - Impact: Minimal - system continues processing remaining queue
+  - Resolution: Automatic - failed files are skipped
+
+### Monitoring
+- **Server Logs**: `/home/cal/docker/tdarr/logs/Tdarr_Server_Log.txt`
+- **Docker Logs**: `docker logs tdarr-server` / `docker logs tdarr-node`
+- **Library Scans**: Automatic hourly scans (2 libraries: ZWgKkmzJp, EjfWXCdU8)
+
+### Common Operations
+
+**Check Status**:
 ```bash
-# Unmapped node with local cache
-podman run -d \
-  --name tdarr-node-gpu \
-  -e nodeType=unmapped \
-  -v "/mnt/NV2/tdarr-cache:/cache" \
-  --device nvidia.com/gpu=all \
-  ghcr.io/haveagitgat/tdarr_node:latest
+ssh 10.10.0.226 "docker ps --format 'table {{.Names}}\t{{.Status}}' | grep tdarr"
 ```

-## Implementation Patterns
+**View Recent Logs**:
+```bash
+ssh 10.10.0.226 "docker logs tdarr-node --since 1h 2>&1 | tail -50"
+```

-### Performance Optimization
-1. **Local Cache Strategy**: Download → Process → Upload (vs. streaming)
-2. **Resource Limits**: Prevent memory exhaustion and kernel crashes
-3. **Network Resilience**: CIFS mount options for stability
-4. **Automated Cleanup**: Prevent accumulation of stuck directories
+**Restart Services**:
+```bash
+ssh 10.10.0.226 "cd /home/cal/docker/tdarr && docker compose restart"
+```

-### Error Prevention
-1. **Plugin Safety**: Null-safe forEach operations `(streams || []).forEach()`
-2. **Clean Installation**: Avoid custom plugin mounts causing version conflicts
-3. **Container Isolation**: Resource limits prevent system-level crashes
-4. **Network Stability**: Unmapped architecture reduces CIFS dependency
+**Check GPU Usage**:
+```bash
+ssh 10.10.0.226 "nvidia-smi"
+```

-### Gaming Integration
-1. **Process Detection**: Monitor for gaming applications and utilities
-2. **GPU Threshold**: Stop transcoding when GPU usage >15%
-3. **Time Windows**: Respect user-defined allowed transcoding hours
-4. **Manual Override**: Direct start/stop commands bypass scheduler
+### API Access
+Base URL: `http://10.10.0.226:8265/api/v2/`

-## Common Workflows
+**Get Node Status**:
+```bash
+curl -s "http://10.10.0.226:8265/api/v2/get-nodes" | jq '.'
+```

-### Initial Setup
-1. Start server with "Allow unmapped Nodes" enabled
-2. Configure node as unmapped with local cache
-3. Install gaming-aware scheduler via cron
-4. Set up monitoring system for automated cleanup
+## GPU Resource Sharing
+This server also runs Jellyfin with GPU transcoding. Coordinate usage:
+- Tdarr uses NVENC for encoding
+- Jellyfin uses NVDEC for decoding
+- Both can run simultaneously for different workloads
+- Monitor GPU memory if running concurrent heavy transcodes

-### Troubleshooting Patterns
-1. **forEach Errors**: Clean plugin installation, avoid custom mounts
-2. **Staging Timeouts**: Monitor system handles automatic cleanup
-3. **System Crashes**: Convert to unmapped node architecture
-4. **Network Issues**: Implement CIFS resilience options
+## Legacy: Gaming-Aware Architecture
+The previous deployment on the local desktop used an unmapped node architecture with gaming detection. This is preserved for reference but not currently in use:

-### Performance Tuning
-1. **Cache Size**: 100-500GB NVMe for concurrent jobs
-2. **Bandwidth**: Unmapped nodes reduce streaming requirements
-3. **Scaling**: Linear scaling with additional unmapped nodes
-4. **GPU Priority**: Gaming detection ensures responsive system
+### Unmapped Node Pattern (Historical)
+For gaming desktops requiring GPU priority management:
+- Node downloads files to local cache before processing
+- Gaming detection pauses transcoding automatically
+- Scheduler script manages time windows
+
+**When to Consider**:
+- Transcoding on a gaming/desktop system
+- Need GPU priority for interactive applications
+- Multiple nodes across network

 ## Best Practices

-### Production Deployment
- Use unmapped node architecture for stability
- Implement comprehensive monitoring
- Configure gaming-aware scheduling for desktop systems
- Set appropriate container resource limits
+### For Current Deployment
+1. Monitor NFS stability - Tdarr depends on reliable media access
+2. Check cache disk space periodically (`df -h /mnt/NV2`)
+3. Review queue for stale files after media library changes
+4. GPU memory: Leave headroom for Jellyfin concurrent usage

-### Development Guidelines
- Test with internal Tdarr test files first
- Implement null-safety checks in custom plugins
- Use structured logging for troubleshooting
- Separate concerns: scheduling, monitoring, processing
+### Error Prevention
+1. **Plugin Updates**: Automatic hourly plugin sync from server
+2. **Retry Logic**: 5 attempts with exponential backoff for file operations
+3. **Container Health**: `restart: unless-stopped` ensures recovery

-### Security Considerations
- Container isolation prevents system-level failures
- Resource limits protect against memory exhaustion
- Network mount resilience prevents kernel crashes
- Automated cleanup prevents disk space issues
+### Troubleshooting Patterns
+1. **File Not Found**: Source was deleted - clear from queue via UI
+2. **Slow Transcodes**: Check NFS latency, GPU utilization
+3. **Node Disconnected**: Restart node container, check server connectivity

-## Migration Patterns
+## Space Savings Estimate
+With ~7,675 files in queue averaging 35% reduction:
+- If average input is 5 GB → saves ~1.75 GB per file
+- Potential savings: ~13 TB when queue completes

-### From Mapped to Unmapped Nodes
-1. Enable "Allow unmapped Nodes" in server options
-2. Update node configuration (add nodeType=unmapped)
-3. Change cache volume to local storage
-4. Remove media volume mapping
-5. Test workflow and monitor performance
-
-### Plugin System Cleanup
-1. Remove all custom plugin mounts
-2. Force server restart to regenerate plugin ZIP
-3. Restart nodes to download fresh plugins
-4. Verify forEach fixes in downloaded plugins
-
-This technology context provides the foundation for implementing, troubleshooting, and optimizing Tdarr transcoding systems in home lab environments.
+This technology context reflects the ubuntu-manticore deployment as of December 2025.