Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture

Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-08-12 23:20:15 -05:00

5.5 KiB

Raw Permalink Blame History

CIFS Mount Resilience Improvements

Date: 2025-08-11
Issue: CIFS network errors escalating to kernel deadlocks and system crashes
Target: /mnt/media mount to NAS at 10.10.0.35

Current Configuration Analysis

Current fstab entry:

//10.10.0.35/media /mnt/media cifs credentials=/home/cal/.samba_credentials,uid=1000,gid=1000,vers=3.1.1,cache=loose,rsize=16777216,wsize=16777216,bsize=4194304,actimeo=30,closetimeo=5,echo_interval=30,noperm 0 0

Problems Identified:

Missing critical timeout options leading to 90-second hangs
Aggressive buffer sizes (16MB) causing memory pressure during network issues
Limited retry attempts (retrans=1) providing minimal resilience
No explicit error handling for graceful degradation
Missing interruption handling preventing recovery from network deadlocks

Recommended CIFS Mount Configuration

New improved fstab entry:

//10.10.0.35/media /mnt/media cifs credentials=/home/cal/.samba_credentials,uid=1000,gid=1000,vers=3.1.1,soft,intr,timeo=15,retrans=3,rsize=1048576,wsize=1048576,cache=loose,actimeo=10,echo_interval=60,_netdev,noauto,x-systemd.automount,x-systemd.device-timeout=10,x-systemd.mount-timeout=30,noperm 0 0

Key Improvements Explained

Better Timeout Handling

timeo=15 - 15-second timeout for RPC calls (prevents 90-second hangs)
retrans=3 - 3 retry attempts instead of 1
x-systemd.device-timeout=10 - 10-second systemd device timeout
x-systemd.mount-timeout=30 - 30-second mount operation timeout

Graceful Error Recovery

soft - Allows operations to fail instead of hanging indefinitely
intr - Allows kernel to interrupt hung operations (CRITICAL for preventing deadlocks)
_netdev - Indicates network dependency for proper boot ordering
noauto,x-systemd.automount - Auto-mount on access, unmount when idle

Preventing Kernel Deadlocks

Smaller buffer sizes - rsize=1048576,wsize=1048576 (1MB instead of 16MB) reduces memory pressure
actimeo=10 - Shorter attribute cache timeout (10s vs 30s) for faster error detection
echo_interval=60 - Longer keepalive interval reduces network chatter

Network Interruption Resilience

cache=loose - Maintains loose caching for better performance with network issues
Combined timeout strategy - Multiple timeout layers prevent single failure from hanging system

Implementation Steps

Step 1: Backup Current Configuration

sudo cp /etc/fstab /etc/fstab.backup

Step 2: Update /etc/fstab

Replace the current line with the recommended configuration above.

Step 3: Test the New Configuration

# Unmount current mount
sudo umount /mnt/media

# Remount with new options  
sudo mount /mnt/media

# Verify new mount options are active
mount | grep /mnt/media

Step 4: Validate Network Resilience

# Test timeout behavior with network simulation
# (Temporarily disconnect NAS network cable for 30 seconds)
# Verify mount operations fail gracefully instead of hanging system

Additional System-Level Protections

1. Network Monitoring Script

Create a monitoring script to detect NAS connectivity issues:

#!/bin/bash
# /mnt/NV2/Development/claude-home/scripts/monitoring/nas-connectivity-monitor.sh
ping -c 1 -W 5 10.10.0.35 || echo "NAS connectivity issue detected"

2. Systemd Service Dependencies

Configure services to gracefully handle mount failures:

# Add to services that depend on /mnt/media
After=mnt-media.mount
Wants=mnt-media.mount

3. Kernel Parameter Tuning

Consider CIFS timeout behavior tuning:

# Add to /etc/sysctl.conf if needed
echo 30 > /sys/module/cifs/parameters/CIFSMaxBufSize

Expected Improvements

After implementing these changes:

Immediate Benefits

No more 90-second hangs - Operations fail fast with 15-second timeouts
Graceful error recovery - intr allows kernel to interrupt hung operations
Reduced memory pressure - Smaller 1MB buffers vs 16MB
Better retry behavior - 3 attempts with exponential backoff

System Stability

Prevents kernel deadlocks - Operations can be interrupted and retried
Faster error detection - 10-second attribute cache timeout
Automatic recovery - systemd auto-mounting handles reconnection

Performance

Maintained caching benefits - cache=loose preserves performance
Reduced network overhead - 60-second keepalive intervals
Efficient buffer usage - 1MB buffers balance performance and stability

Files to Modify

/etc/fstab - Primary mount configuration
Optional monitoring scripts - NAS connectivity checks
Service configurations - Dependencies on mount availability

Testing Checklist

Backup current fstab configuration
Apply new mount options
Test normal operation (read/write files)
Test network interruption handling (disconnect NAS briefly)
Verify fast failure instead of system hangs
Monitor system stability over 24 hours
Validate with Tdarr container operations

5.5 KiB

Raw Permalink Blame History

CIFS Mount Resilience Improvements

Current Configuration Analysis

Recommended CIFS Mount Configuration

Key Improvements Explained

Better Timeout Handling

Graceful Error Recovery

Preventing Kernel Deadlocks

Network Interruption Resilience

Implementation Steps

Step 1: Backup Current Configuration

Step 2: Update /etc/fstab

Step 3: Test the New Configuration

Step 4: Validate Network Resilience

Additional System-Level Protections

1. Network Monitoring Script

2. Systemd Service Dependencies

3. Kernel Parameter Tuning

Expected Improvements

Immediate Benefits

System Stability

Performance

Files to Modify

Testing Checklist

Monitoring and Validation

Success Criteria

Long-term Monitoring

5.5 KiB Raw Permalink Blame History

CIFS Mount Resilience Improvements

Current Configuration Analysis

Recommended CIFS Mount Configuration

Key Improvements Explained

Better Timeout Handling

Graceful Error Recovery

Preventing Kernel Deadlocks

Network Interruption Resilience

Implementation Steps

Step 1: Backup Current Configuration

Step 2: Update /etc/fstab

Step 3: Test the New Configuration

Step 4: Validate Network Resilience

Additional System-Level Protections

1. Network Monitoring Script

2. Systemd Service Dependencies

3. Kernel Parameter Tuning

Expected Improvements

Immediate Benefits

System Stability

Performance

Files to Modify

Testing Checklist

Monitoring and Validation

Success Criteria

Long-term Monitoring

5.5 KiB

Raw Permalink Blame History