Commit Graph

4 Commits

Author SHA1 Message Date
Cal Corum
34702a37fc CLAUDE: Add comprehensive KDE Plasma crash analysis and prevention documentation
- Add crash-analysis-summary.md: Complete incident timeline and root cause analysis
- Add tdarr-container-fixes.md: Container resource limits and unmapped node conversion
- Add cifs-mount-resilience-fixes.md: CIFS mount options for kernel stability
- Update tdarr-troubleshooting.md: Link to new system crash prevention measures
- Update nas-mount-configuration.md: Add stability considerations for production systems

Root cause: CIFS streaming of large files during transcoding caused kernel memory
corruption and system deadlock. Documents provide comprehensive prevention strategy.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-11 12:29:31 -05:00
Cal Corum
df81d475ef CLAUDE: Update Tdarr documentation with file transfer optimizations
- Document hybrid storage strategy for server (local DB/configs, network backups)
- Add production unmapped node configuration with NVMe cache optimization
- Document Docker→Podman migration benefits and GPU improvements
- Update cache paths to reflect actual NVMe location (/mnt/NV2/tdarr-cache)
- Add gaming-aware scheduler and enhanced monitoring system documentation
- Update configuration file paths to current production locations
- Document 100x database performance improvement with local storage

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 16:20:27 -05:00
Cal Corum
6cc0d0df2e CLAUDE: Enhance Tdarr monitoring with automatic staging timeout cleanup and Discord notifications
Major improvements to Tdarr monitoring system addressing staging section timeout issues:

## New Features:
- **Automatic Staging Timeout Detection**: Monitors server logs for 300s limbo timeouts every 20 minutes
- **Stuck Directory Cleanup**: Automatically removes work directories with partial downloads preventing staging cleanup
- **Enhanced Discord Notifications**: Structured markdown messages with working user pings extracted from code blocks
- **Comprehensive Logging**: Timestamped logs with automatic rotation (1MB limit) at /tmp/tdarr-monitor/monitor.log
- **Multi-System Monitoring**: Covers both server staging issues and node worker stalls

## Technical Improvements:
- **JSON Handling**: Proper escaping for special characters, quotes, and newlines in Discord webhooks
- **Shell Compatibility**: Fixed `[[` vs `[` syntax for Docker container execution (sh vs bash)
- **Message Structure**: Professional markdown formatting with separation of alerts and actionable pings
- **Error Handling**: Robust SSH command execution and container operation handling

## Problem Solved:
- Root Cause: Hardcoded 300s staging timeout in Tdarr v2.45.01 causing large files (2-3GB+) to fail download
- Impact: Partial downloads created stuck .tmp files, ENOTEMPTY errors preventing cleanup, cascade failures
- Solution: Automated detection and cleanup system with proactive Discord alerts

## Files Added/Modified:
- `scripts/monitoring/tdarr-timeout-monitor.sh` - Enhanced monitoring script v2.0
- `reference/docker/tdarr-troubleshooting.md` - Added comprehensive monitoring system documentation

## Operational Benefits:
- Reduces manual intervention through automatic cleanup
- Self-healing system prevents staging section blockage
- Enterprise-ready monitoring with structured alerts
- Minimal resource impact: ~3s every 20min, <2MB storage

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-10 10:38:43 -05:00
Cal Corum
df3d22b218 CLAUDE: Expand documentation system and organize operational scripts
- Add comprehensive Tdarr troubleshooting and GPU transcoding documentation
- Create /scripts directory for active operational scripts
- Archive mapped node example in /examples for reference
- Update CLAUDE.md with scripts directory context triggers
- Add distributed transcoding patterns and NVIDIA troubleshooting guides
- Enhance documentation structure with clear directory usage guidelines

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-09 15:53:09 -05:00