Complete restructure from patterns/examples/reference to technology-focused directories: • Created technology-specific directories with comprehensive documentation: - /tdarr/ - Transcoding automation with gaming-aware scheduling - /docker/ - Container management with GPU acceleration patterns - /vm-management/ - Virtual machine automation and cloud-init - /networking/ - SSH infrastructure, reverse proxy, and security - /monitoring/ - System health checks and Discord notifications - /databases/ - Database patterns and troubleshooting - /development/ - Programming language patterns (bash, nodejs, python, vuejs) • Enhanced CLAUDE.md with intelligent context loading: - Technology-first loading rules for automatic context provision - Troubleshooting keyword triggers for emergency scenarios - Documentation maintenance protocols with automated reminders - Context window management for optimal documentation updates • Preserved valuable content from .claude/tmp/: - SSH security improvements and server inventory - Tdarr CIFS troubleshooting and Docker iptables solutions - Operational scripts with proper technology classification • Benefits achieved: - Self-contained technology directories with complete context - Automatic loading of relevant documentation based on keywords - Emergency-ready troubleshooting with comprehensive guides - Scalable structure for future technology additions - Eliminated context bloat through targeted loading 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
6.6 KiB
Tdarr CIFS Troubleshooting Session - 2025-08-11
Problem Statement
Tdarr unmapped node experiencing persistent download timeouts at 9:08 PM with large files (31GB+ remux), causing "Cancelling" messages and stuck downloads. Downloads would hang for 33+ minutes before timing out, despite container remaining running.
Initial Hypothesis: Mapped vs Unmapped Node Issue
Status: ❌ DISPROVEN
- Suspected unmapped node timeout configuration differences
- Windows PC running mapped Tdarr node works fine (slow but stable)
- Both mapped and unmapped Linux nodes exhibited identical timeout issues
- Conclusion: Architecture type was not the root cause
Key Insight: Windows vs Linux Performance Difference
Observation: Windows Tdarr node (mapped mode) works without timeouts, Linux nodes (both mapped/unmapped) fail Implication: Platform-specific issue, likely network stack or CIFS implementation
Root Cause Discovery Process
Phase 1: Linux Client CIFS Analysis
Method: Direct CIFS mount testing on Tdarr node machine (nobara-pc)
Initial CIFS Mount Configuration (problematic):
//10.10.0.35/media on /mnt/media type cifs (rw,relatime,vers=3.1.1,cache=strict,upcall_target=app,username=root,uid=1000,forceuid,gid=1000,forcegid,addr=10.10.0.35,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,noperm,reparse=nfs,nativesocket,symlink=native,rsize=4194304,wsize=4194304,bsize=1048576,retrans=1,echo_interval=60,actimeo=30,closetimeo=1,_netdev,x-systemd.automount,x-systemd.device-timeout=10,x-systemd.mount-timeout=30)
Critical Issues Identified:
soft- Mount fails on timeout instead of retrying indefinitelyretrans=1- Only 1 retry attempt (NFS option, invalid for CIFS)closetimeo=1- Very short close timeout (1 second)cache=strict- No local caching, poor performance for large filesx-systemd.mount-timeout=30- 30-second mount timeout
Optimization Applied:
//10.10.0.35/media /mnt/media cifs credentials=/home/cal/.samba_credentials,uid=1000,gid=1000,vers=3.1.1,hard,rsize=16777216,wsize=16777216,cache=loose,actimeo=60,echo_interval=30,_netdev,x-systemd.automount,x-systemd.device-timeout=60,x-systemd.mount-timeout=120,noperm 0 0
Performance Testing Results:
- Local SSD:
dd800MB in 0.217s (4.0 GB/s) - baseline - CIFS 1MB blocks: 42.7 MB/s - fast, no issues
- CIFS 4MB blocks: 205 MB/s - fast, no issues
- CIFS 8MB blocks: 83.1 MB/s - 3-minute terminal freeze
Critical Discovery: Block size dependency causing I/O blocking with large transfers
Phase 2: Tdarr Server-Side Analysis
Method: Test Tdarr API download path directly
API Test Command:
curl -X POST "http://10.10.0.43:8265/api/v2/file/download" \
-H "Content-Type: application/json" \
-d '{"filePath": "/media/Movies/Jumanji (1995)/Jumanji (1995) Remux-1080p Proper.mkv"}' \
-o /tmp/tdarr-api-test.mkv
Results:
- Performance: 55.7-58.6 MB/s sustained
- Progress: Downloaded 15.3GB of 23GB (66%)
- Failure: Download hung at 66% completion
- Timing: Hung after ~5 minutes (consistent with previous timeout patterns)
Phase 3: Tdarr Server CIFS Configuration Analysis
Method: Examine server-side storage mount
Server CIFS Mount (problematic):
//10.10.0.35/media /mnt/truenas-share cifs credentials=/root/.truenascreds,vers=3.1.1,rsize=4194304,wsize=4194304,cache=strict,actimeo=30,echo_interval=60,noperm 0 0
Server Issues Identified:
- Missing
hard- Defaults tosoftmount behavior cache=strict- No local caching (same issue as client)- No retry/timeout extensions - Uses unreliable kernel defaults
- No systemd timeout protection
Root Cause Confirmed
Primary Issue: Tdarr server's CIFS mount to TrueNAS using suboptimal configuration Impact: Large file streaming via Tdarr API hangs when server's CIFS mount hits I/O blocking Evidence: API download hung at exact same pattern as node timeouts (66% through large file)
Solution Strategy
Fix Tdarr Server CIFS Mount Configuration:
//10.10.0.35/media /mnt/truenas-share cifs credentials=/root/.truenascreds,vers=3.1.1,hard,rsize=4194304,wsize=4194304,cache=loose,actimeo=60,echo_interval=30,_netdev,x-systemd.device-timeout=60,x-systemd.mount-timeout=120,noperm 0 0
Key Optimizations:
hard- Retry indefinitely instead of timing outcache=loose- Enable local caching for large file performanceactimeo=60- Longer attribute cachingecho_interval=30- More frequent keep-alives- Extended systemd timeouts for reliability
Implementation Steps
- Update server
/etc/fstabwith optimized CIFS configuration - Remount server storage:
ssh tdarr "sudo umount /mnt/truenas-share" ssh tdarr "sudo systemctl daemon-reload" ssh tdarr "sudo mount /mnt/truenas-share" - Test large file API download to verify fix
- Resume Tdarr transcoding with confidence in large file handling
Technical Insights
CIFS vs SMB Protocol Differences
- Windows nodes: Use native SMB implementation (stable)
- Linux nodes: Use kernel CIFS module (prone to I/O blocking with poor configuration)
- Block size sensitivity: Large block transfers require careful timeout/retry configuration
Tdarr Architecture Impact
- Unmapped nodes: Download entire files via API before processing (high bandwidth, vulnerable to server CIFS issues)
- Mapped nodes: Stream files during processing (lower bandwidth, still vulnerable to server CIFS issues)
- Root cause affects both architectures since server-side storage access is the bottleneck
Performance Expectations Post-Fix
- Consistent 50-100 MB/s for large file downloads
- No timeout failures with properly configured hard mounts
- Reliable processing of 31GB+ remux files
Files Modified
- Client:
/etc/fstabon nobara-pc (CIFS optimization applied) - Server:
/etc/fstabon tdarr server (pending optimization)
Monitoring and Validation
- Success criteria: Tdarr API download of 23GB+ file completes without hanging
- Performance target: Sustained 50+ MB/s throughout entire transfer
- Reliability target: No timeouts during large file processing
Session Outcome
Status: ✅ ROOT CAUSE IDENTIFIED AND SOLUTION READY
- Eliminated client-side variables through systematic testing
- Confirmed server-side CIFS configuration as bottleneck
- Validated fix strategy through client-side optimization success
- Ready to implement server-side solution
Session Date: 2025-08-11
Duration: ~3 hours
Methods: Direct testing, API analysis, mount configuration review