CLAUDE: Update Tdarr context for ubuntu-manticore deployment
Rewrote documentation to reflect current deployment on ubuntu-manticore (10.10.0.226) with actual performance metrics and queue status: - Server specs: Ubuntu 24.04, GTX 1070, Docker Compose - Storage: NFS media (48TB) + local NVMe cache (1.9TB) - Performance: ~13 files/hour, 64% compression, HEVC output - Queue: 7,675 pending, 37,406 total jobs processed - Added operational commands, API access, GPU sharing notes - Moved gaming-aware scheduler to legacy section (not needed on dedicated server) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
117788f216
commit
b8b4b13130
292
tdarr/CONTEXT.md
292
tdarr/CONTEXT.md
@ -1,152 +1,204 @@
|
|||||||
# Tdarr Transcoding System - Technology Context
|
# Tdarr Transcoding System - Technology Context
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
Tdarr is a distributed transcoding system that converts media files to optimized formats. This implementation uses an intelligent gaming-aware scheduler with unmapped node architecture for optimal performance and system stability.
|
Tdarr is a distributed transcoding system that converts media files to optimized formats. The current deployment runs on a dedicated Ubuntu server with GPU transcoding and NFS-based media storage.
|
||||||
|
|
||||||
## Architecture Patterns
|
## Current Deployment
|
||||||
|
|
||||||
### Distributed Unmapped Node Architecture (Recommended)
|
### Server: ubuntu-manticore (10.10.0.226)
|
||||||
**Pattern**: Server-Node separation with local high-speed cache
|
- **OS**: Ubuntu 24.04.3 LTS (Noble Numbat)
|
||||||
- **Server**: Tdarr Server manages queue, web interface, and coordination
|
- **GPU**: NVIDIA GeForce GTX 1070 (8GB VRAM)
|
||||||
- **Node**: Unmapped nodes with local NVMe cache for processing
|
- **Driver**: 570.195.03
|
||||||
- **Benefits**: 3-5x performance improvement, network I/O reduction, linear scaling
|
- **Container Runtime**: Docker with Compose
|
||||||
|
- **Web UI**: http://10.10.0.226:8265
|
||||||
|
|
||||||
**When to Use**:
|
### Storage Architecture
|
||||||
- Multiple transcoding nodes across network
|
| Mount | Source | Purpose |
|
||||||
- High-performance requirements (10GB+ files)
|
|-------|--------|---------|
|
||||||
- Network bandwidth limitations
|
| `/mnt/truenas/media` | NFS from 10.10.0.35 | Media library (48TB total, ~29TB used) |
|
||||||
- Gaming systems requiring GPU priority management
|
| `/mnt/NV2/tdarr-cache` | Local NVMe | Transcode work directory (1.9TB, ~40% used) |
|
||||||
|
|
||||||
### Configuration Principles
|
### Container Configuration
|
||||||
1. **Cache Optimization**: Use local NVMe storage for work directories
|
**Location**: `/home/cal/docker/tdarr/docker-compose.yml`
|
||||||
2. **Gaming Detection**: Automatic pause during GPU-intensive activities
|
|
||||||
3. **Resource Isolation**: Container limits prevent kernel-level crashes
|
|
||||||
4. **Monitoring Integration**: Automated cleanup and Discord notifications
|
|
||||||
|
|
||||||
## Core Components
|
|
||||||
|
|
||||||
### Gaming-Aware Scheduler
|
|
||||||
**Purpose**: Automatically manages Tdarr node to avoid conflicts with gaming
|
|
||||||
**Location**: `scripts/tdarr-schedule-manager.sh`
|
|
||||||
|
|
||||||
**Key Features**:
|
|
||||||
- Detects gaming processes (Steam, Lutris, Wine, etc.)
|
|
||||||
- GPU usage monitoring (>15% threshold)
|
|
||||||
- Configurable time windows
|
|
||||||
- Automated temporary directory cleanup
|
|
||||||
|
|
||||||
**Schedule Format**: `"HOUR_START-HOUR_END:DAYS"`
|
|
||||||
- `"22-07:daily"` - Overnight transcoding
|
|
||||||
- `"09-17:1-5"` - Business hours weekdays only
|
|
||||||
- `"14-16:6,7"` - Weekend afternoon window
|
|
||||||
|
|
||||||
### Monitoring System
|
|
||||||
**Purpose**: Prevents staging section timeouts and system instability
|
|
||||||
**Location**: `scripts/monitoring/tdarr-timeout-monitor.sh`
|
|
||||||
|
|
||||||
**Capabilities**:
|
|
||||||
- Staging timeout detection (300-second hardcoded limit)
|
|
||||||
- Automatic work directory cleanup
|
|
||||||
- Discord notifications with user pings
|
|
||||||
- Log rotation and retention management
|
|
||||||
|
|
||||||
### Container Architecture
|
|
||||||
**Server Configuration**:
|
|
||||||
```yaml
|
```yaml
|
||||||
# Hybrid storage with resource limits
|
version: "3.8"
|
||||||
services:
|
services:
|
||||||
tdarr:
|
tdarr:
|
||||||
image: ghcr.io/haveagitgat/tdarr:latest
|
image: ghcr.io/haveagitgat/tdarr:latest
|
||||||
ports: ["8265:8266"]
|
container_name: tdarr-server
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- "8265:8265" # Web UI
|
||||||
|
- "8266:8266" # Server port (for nodes)
|
||||||
|
environment:
|
||||||
|
- PUID=1000
|
||||||
|
- PGID=1000
|
||||||
|
- TZ=America/Chicago
|
||||||
|
- serverIP=0.0.0.0
|
||||||
|
- serverPort=8266
|
||||||
|
- webUIPort=8265
|
||||||
volumes:
|
volumes:
|
||||||
- "./tdarr-data:/app/configs"
|
- ./server-data:/app/server
|
||||||
- "/mnt/media:/media"
|
- ./configs:/app/configs
|
||||||
|
- ./logs:/app/logs
|
||||||
|
- /mnt/truenas/media:/media
|
||||||
|
|
||||||
|
tdarr-node:
|
||||||
|
image: ghcr.io/haveagitgat/tdarr_node:latest
|
||||||
|
container_name: tdarr-node
|
||||||
|
restart: unless-stopped
|
||||||
|
environment:
|
||||||
|
- PUID=1000
|
||||||
|
- PGID=1000
|
||||||
|
- TZ=America/Chicago
|
||||||
|
- serverIP=tdarr
|
||||||
|
- serverPort=8266
|
||||||
|
- nodeName=manticore-gpu
|
||||||
|
volumes:
|
||||||
|
- ./node-data:/app/configs
|
||||||
|
- /mnt/truenas/media:/media
|
||||||
|
- /mnt/NV2/tdarr-cache:/temp
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
count: all
|
||||||
|
capabilities: [gpu]
|
||||||
|
depends_on:
|
||||||
|
- tdarr
|
||||||
```
|
```
|
||||||
|
|
||||||
**Node Configuration**:
|
### Node Configuration
|
||||||
|
- **Node Name**: manticore-gpu
|
||||||
|
- **Node Type**: Mapped (both server and node access same NFS mount)
|
||||||
|
- **Workers**: 1 GPU transcode worker, 4 GPU healthcheck workers
|
||||||
|
- **Schedule**: Disabled (runs 24/7)
|
||||||
|
|
||||||
|
### Current Queue Status (Dec 2025)
|
||||||
|
| Metric | Value |
|
||||||
|
|--------|-------|
|
||||||
|
| Transcode Queue | ~7,675 files |
|
||||||
|
| Success/Not Required | 8,378 files |
|
||||||
|
| Healthy Files | 16,628 files |
|
||||||
|
| Job History | 37,406 total jobs |
|
||||||
|
|
||||||
|
### Performance Metrics
|
||||||
|
- **Throughput**: ~13 files/hour (varies by file size)
|
||||||
|
- **Average Compression**: ~64% of original size (35% space savings)
|
||||||
|
- **Codec**: HEVC (h265) output at 1080p
|
||||||
|
- **Typical File Sizes**: 3-7 GB input → 2-4.5 GB output
|
||||||
|
|
||||||
|
## Architecture Patterns
|
||||||
|
|
||||||
|
### Mapped Node with Shared Storage
|
||||||
|
**Pattern**: Server and node share the same media mount via NFS
|
||||||
|
- **Advantage**: Simpler configuration, no file transfer overhead
|
||||||
|
- **Trade-off**: Depends on stable NFS connection during transcoding
|
||||||
|
|
||||||
|
**When to Use**:
|
||||||
|
- Dedicated transcoding server (not a gaming/desktop system)
|
||||||
|
- Reliable network storage infrastructure
|
||||||
|
- Single-node deployments
|
||||||
|
|
||||||
|
### Local NVMe Cache
|
||||||
|
Work directory on local NVMe (`/mnt/NV2/tdarr-cache:/temp`) provides:
|
||||||
|
- Fast read/write for transcode operations
|
||||||
|
- Isolation from network latency during processing
|
||||||
|
- Sufficient space for large remux files (1TB+ available)
|
||||||
|
|
||||||
|
## Operational Notes
|
||||||
|
|
||||||
|
### Recent Activity
|
||||||
|
System is actively processing with strong throughput. Recent successful transcodes include:
|
||||||
|
- Dead Like Me (2003) - multiple episodes
|
||||||
|
- Supernatural (2005) - S03 episodes
|
||||||
|
- I Dream of Jeannie (1965) - S01 episodes
|
||||||
|
- Da Vinci's Demons (2013) - S01 episodes
|
||||||
|
|
||||||
|
### Minor Issues
|
||||||
|
- **Occasional File Not Found (400)**: Files deleted/moved while queued fail after 5 retries
|
||||||
|
- Impact: Minimal - system continues processing remaining queue
|
||||||
|
- Resolution: Automatic - failed files are skipped
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
- **Server Logs**: `/home/cal/docker/tdarr/logs/Tdarr_Server_Log.txt`
|
||||||
|
- **Docker Logs**: `docker logs tdarr-server` / `docker logs tdarr-node`
|
||||||
|
- **Library Scans**: Automatic hourly scans (2 libraries: ZWgKkmzJp, EjfWXCdU8)
|
||||||
|
|
||||||
|
### Common Operations
|
||||||
|
|
||||||
|
**Check Status**:
|
||||||
```bash
|
```bash
|
||||||
# Unmapped node with local cache
|
ssh 10.10.0.226 "docker ps --format 'table {{.Names}}\t{{.Status}}' | grep tdarr"
|
||||||
podman run -d \
|
|
||||||
--name tdarr-node-gpu \
|
|
||||||
-e nodeType=unmapped \
|
|
||||||
-v "/mnt/NV2/tdarr-cache:/cache" \
|
|
||||||
--device nvidia.com/gpu=all \
|
|
||||||
ghcr.io/haveagitgat/tdarr_node:latest
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Implementation Patterns
|
**View Recent Logs**:
|
||||||
|
```bash
|
||||||
|
ssh 10.10.0.226 "docker logs tdarr-node --since 1h 2>&1 | tail -50"
|
||||||
|
```
|
||||||
|
|
||||||
### Performance Optimization
|
**Restart Services**:
|
||||||
1. **Local Cache Strategy**: Download → Process → Upload (vs. streaming)
|
```bash
|
||||||
2. **Resource Limits**: Prevent memory exhaustion and kernel crashes
|
ssh 10.10.0.226 "cd /home/cal/docker/tdarr && docker compose restart"
|
||||||
3. **Network Resilience**: CIFS mount options for stability
|
```
|
||||||
4. **Automated Cleanup**: Prevent accumulation of stuck directories
|
|
||||||
|
|
||||||
### Error Prevention
|
**Check GPU Usage**:
|
||||||
1. **Plugin Safety**: Null-safe forEach operations `(streams || []).forEach()`
|
```bash
|
||||||
2. **Clean Installation**: Avoid custom plugin mounts causing version conflicts
|
ssh 10.10.0.226 "nvidia-smi"
|
||||||
3. **Container Isolation**: Resource limits prevent system-level crashes
|
```
|
||||||
4. **Network Stability**: Unmapped architecture reduces CIFS dependency
|
|
||||||
|
|
||||||
### Gaming Integration
|
### API Access
|
||||||
1. **Process Detection**: Monitor for gaming applications and utilities
|
Base URL: `http://10.10.0.226:8265/api/v2/`
|
||||||
2. **GPU Threshold**: Stop transcoding when GPU usage >15%
|
|
||||||
3. **Time Windows**: Respect user-defined allowed transcoding hours
|
|
||||||
4. **Manual Override**: Direct start/stop commands bypass scheduler
|
|
||||||
|
|
||||||
## Common Workflows
|
**Get Node Status**:
|
||||||
|
```bash
|
||||||
|
curl -s "http://10.10.0.226:8265/api/v2/get-nodes" | jq '.'
|
||||||
|
```
|
||||||
|
|
||||||
### Initial Setup
|
## GPU Resource Sharing
|
||||||
1. Start server with "Allow unmapped Nodes" enabled
|
This server also runs Jellyfin with GPU transcoding. Coordinate usage:
|
||||||
2. Configure node as unmapped with local cache
|
- Tdarr uses NVENC for encoding
|
||||||
3. Install gaming-aware scheduler via cron
|
- Jellyfin uses NVDEC for decoding
|
||||||
4. Set up monitoring system for automated cleanup
|
- Both can run simultaneously for different workloads
|
||||||
|
- Monitor GPU memory if running concurrent heavy transcodes
|
||||||
|
|
||||||
### Troubleshooting Patterns
|
## Legacy: Gaming-Aware Architecture
|
||||||
1. **forEach Errors**: Clean plugin installation, avoid custom mounts
|
The previous deployment on the local desktop used an unmapped node architecture with gaming detection. This is preserved for reference but not currently in use:
|
||||||
2. **Staging Timeouts**: Monitor system handles automatic cleanup
|
|
||||||
3. **System Crashes**: Convert to unmapped node architecture
|
|
||||||
4. **Network Issues**: Implement CIFS resilience options
|
|
||||||
|
|
||||||
### Performance Tuning
|
### Unmapped Node Pattern (Historical)
|
||||||
1. **Cache Size**: 100-500GB NVMe for concurrent jobs
|
For gaming desktops requiring GPU priority management:
|
||||||
2. **Bandwidth**: Unmapped nodes reduce streaming requirements
|
- Node downloads files to local cache before processing
|
||||||
3. **Scaling**: Linear scaling with additional unmapped nodes
|
- Gaming detection pauses transcoding automatically
|
||||||
4. **GPU Priority**: Gaming detection ensures responsive system
|
- Scheduler script manages time windows
|
||||||
|
|
||||||
|
**When to Consider**:
|
||||||
|
- Transcoding on a gaming/desktop system
|
||||||
|
- Need GPU priority for interactive applications
|
||||||
|
- Multiple nodes across network
|
||||||
|
|
||||||
## Best Practices
|
## Best Practices
|
||||||
|
|
||||||
### Production Deployment
|
### For Current Deployment
|
||||||
- Use unmapped node architecture for stability
|
1. Monitor NFS stability - Tdarr depends on reliable media access
|
||||||
- Implement comprehensive monitoring
|
2. Check cache disk space periodically (`df -h /mnt/NV2`)
|
||||||
- Configure gaming-aware scheduling for desktop systems
|
3. Review queue for stale files after media library changes
|
||||||
- Set appropriate container resource limits
|
4. GPU memory: Leave headroom for Jellyfin concurrent usage
|
||||||
|
|
||||||
### Development Guidelines
|
### Error Prevention
|
||||||
- Test with internal Tdarr test files first
|
1. **Plugin Updates**: Automatic hourly plugin sync from server
|
||||||
- Implement null-safety checks in custom plugins
|
2. **Retry Logic**: 5 attempts with exponential backoff for file operations
|
||||||
- Use structured logging for troubleshooting
|
3. **Container Health**: `restart: unless-stopped` ensures recovery
|
||||||
- Separate concerns: scheduling, monitoring, processing
|
|
||||||
|
|
||||||
### Security Considerations
|
### Troubleshooting Patterns
|
||||||
- Container isolation prevents system-level failures
|
1. **File Not Found**: Source was deleted - clear from queue via UI
|
||||||
- Resource limits protect against memory exhaustion
|
2. **Slow Transcodes**: Check NFS latency, GPU utilization
|
||||||
- Network mount resilience prevents kernel crashes
|
3. **Node Disconnected**: Restart node container, check server connectivity
|
||||||
- Automated cleanup prevents disk space issues
|
|
||||||
|
|
||||||
## Migration Patterns
|
## Space Savings Estimate
|
||||||
|
With ~7,675 files in queue averaging 35% reduction:
|
||||||
|
- If average input is 5 GB → saves ~1.75 GB per file
|
||||||
|
- Potential savings: ~13 TB when queue completes
|
||||||
|
|
||||||
### From Mapped to Unmapped Nodes
|
This technology context reflects the ubuntu-manticore deployment as of December 2025.
|
||||||
1. Enable "Allow unmapped Nodes" in server options
|
|
||||||
2. Update node configuration (add nodeType=unmapped)
|
|
||||||
3. Change cache volume to local storage
|
|
||||||
4. Remove media volume mapping
|
|
||||||
5. Test workflow and monitor performance
|
|
||||||
|
|
||||||
### Plugin System Cleanup
|
|
||||||
1. Remove all custom plugin mounts
|
|
||||||
2. Force server restart to regenerate plugin ZIP
|
|
||||||
3. Restart nodes to download fresh plugins
|
|
||||||
4. Verify forEach fixes in downloaded plugins
|
|
||||||
|
|
||||||
This technology context provides the foundation for implementing, troubleshooting, and optimizing Tdarr transcoding systems in home lab environments.
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user