claude-home/tdarr/CONTEXT.md

# Tdarr Transcoding System - Technology Context

## Overview
Tdarr is a distributed transcoding system that converts media files to optimized formats. This implementation uses an intelligent gaming-aware scheduler with unmapped node architecture for optimal performance and system stability.

## Architecture Patterns

### Distributed Unmapped Node Architecture (Recommended)
**Pattern**: Server-Node separation with local high-speed cache
- **Server**: Tdarr Server manages queue, web interface, and coordination
- **Node**: Unmapped nodes with local NVMe cache for processing
- **Benefits**: 3-5x performance improvement, network I/O reduction, linear scaling

**When to Use**:
- Multiple transcoding nodes across network
- High-performance requirements (10GB+ files)
- Network bandwidth limitations
- Gaming systems requiring GPU priority management

### Configuration Principles
1. **Cache Optimization**: Use local NVMe storage for work directories
2. **Gaming Detection**: Automatic pause during GPU-intensive activities
3. **Resource Isolation**: Container limits prevent kernel-level crashes
4. **Monitoring Integration**: Automated cleanup and Discord notifications

## Core Components

### Gaming-Aware Scheduler
**Purpose**: Automatically manages Tdarr node to avoid conflicts with gaming
**Location**: `scripts/tdarr-schedule-manager.sh`

**Key Features**:
- Detects gaming processes (Steam, Lutris, Wine, etc.)
- GPU usage monitoring (>15% threshold)
- Configurable time windows
- Automated temporary directory cleanup

**Schedule Format**: `"HOUR_START-HOUR_END:DAYS"`
- `"22-07:daily"` - Overnight transcoding
- `"09-17:1-5"` - Business hours weekdays only
- `"14-16:6,7"` - Weekend afternoon window

### Monitoring System
**Purpose**: Prevents staging section timeouts and system instability
**Location**: `scripts/monitoring/tdarr-timeout-monitor.sh`

**Capabilities**:
- Staging timeout detection (300-second hardcoded limit)
- Automatic work directory cleanup
- Discord notifications with user pings
- Log rotation and retention management

### Container Architecture
**Server Configuration**:
```yaml
# Hybrid storage with resource limits
services:
  tdarr:
    image: ghcr.io/haveagitgat/tdarr:latest
    ports: ["8265:8266"]
    volumes:
      - "./tdarr-data:/app/configs"
      - "/mnt/media:/media"
```

**Node Configuration**:
```bash
# Unmapped node with local cache
podman run -d \
  --name tdarr-node-gpu \
  -e nodeType=unmapped \
  -v "/mnt/NV2/tdarr-cache:/cache" \
  --device nvidia.com/gpu=all \
  ghcr.io/haveagitgat/tdarr_node:latest
```

## Implementation Patterns

### Performance Optimization
1. **Local Cache Strategy**: Download → Process → Upload (vs. streaming)
2. **Resource Limits**: Prevent memory exhaustion and kernel crashes
3. **Network Resilience**: CIFS mount options for stability
4. **Automated Cleanup**: Prevent accumulation of stuck directories

### Error Prevention
1. **Plugin Safety**: Null-safe forEach operations `(streams || []).forEach()`
2. **Clean Installation**: Avoid custom plugin mounts causing version conflicts
3. **Container Isolation**: Resource limits prevent system-level crashes
4. **Network Stability**: Unmapped architecture reduces CIFS dependency

### Gaming Integration
1. **Process Detection**: Monitor for gaming applications and utilities
2. **GPU Threshold**: Stop transcoding when GPU usage >15%
3. **Time Windows**: Respect user-defined allowed transcoding hours
4. **Manual Override**: Direct start/stop commands bypass scheduler

## Common Workflows

### Initial Setup
1. Start server with "Allow unmapped Nodes" enabled
2. Configure node as unmapped with local cache
3. Install gaming-aware scheduler via cron
4. Set up monitoring system for automated cleanup

### Troubleshooting Patterns
1. **forEach Errors**: Clean plugin installation, avoid custom mounts
2. **Staging Timeouts**: Monitor system handles automatic cleanup
3. **System Crashes**: Convert to unmapped node architecture
4. **Network Issues**: Implement CIFS resilience options

### Performance Tuning
1. **Cache Size**: 100-500GB NVMe for concurrent jobs
2. **Bandwidth**: Unmapped nodes reduce streaming requirements
3. **Scaling**: Linear scaling with additional unmapped nodes
4. **GPU Priority**: Gaming detection ensures responsive system

## Best Practices

### Production Deployment
- Use unmapped node architecture for stability
- Implement comprehensive monitoring
- Configure gaming-aware scheduling for desktop systems
- Set appropriate container resource limits

### Development Guidelines
- Test with internal Tdarr test files first
- Implement null-safety checks in custom plugins
- Use structured logging for troubleshooting
- Separate concerns: scheduling, monitoring, processing

### Security Considerations
- Container isolation prevents system-level failures
- Resource limits protect against memory exhaustion
- Network mount resilience prevents kernel crashes
- Automated cleanup prevents disk space issues

## Migration Patterns

### From Mapped to Unmapped Nodes
1. Enable "Allow unmapped Nodes" in server options
2. Update node configuration (add nodeType=unmapped)
3. Change cache volume to local storage
4. Remove media volume mapping
5. Test workflow and monitor performance

### Plugin System Cleanup
1. Remove all custom plugin mounts
2. Force server restart to regenerate plugin ZIP
3. Restart nodes to download fresh plugins
4. Verify forEach fixes in downloaded plugins

This technology context provides the foundation for implementing, troubleshooting, and optimizing Tdarr transcoding systems in home lab environments.