All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
Adds title, description, type, domain, and tags frontmatter to every doc for improved KB semantic search. The description field is prepended to every search chunk, and domain/type/tags enable filtered queries. Type values: context, guide, runbook, reference, troubleshooting Domain values match directory structure (networking, docker, etc.) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.8 KiB
5.8 KiB
| title | description | type | domain | tags | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| CIFS Mount Resilience Fixes | Improved CIFS fstab config to prevent kernel deadlocks during NAS network issues, with soft mounts, interrupt handling, reduced buffers, and systemd automount. | runbook | networking |
|
CIFS Mount Resilience Improvements
Date: 2025-08-11
Issue: CIFS network errors escalating to kernel deadlocks and system crashes
Target: /mnt/media mount to NAS at 10.10.0.35
Current Configuration Analysis
Current fstab entry:
//10.10.0.35/media /mnt/media cifs credentials=/home/cal/.samba_credentials,uid=1000,gid=1000,vers=3.1.1,cache=loose,rsize=16777216,wsize=16777216,bsize=4194304,actimeo=30,closetimeo=5,echo_interval=30,noperm 0 0
Problems Identified:
- Missing critical timeout options leading to 90-second hangs
- Aggressive buffer sizes (16MB) causing memory pressure during network issues
- Limited retry attempts (retrans=1) providing minimal resilience
- No explicit error handling for graceful degradation
- Missing interruption handling preventing recovery from network deadlocks
Recommended CIFS Mount Configuration
New improved fstab entry:
//10.10.0.35/media /mnt/media cifs credentials=/home/cal/.samba_credentials,uid=1000,gid=1000,vers=3.1.1,soft,intr,timeo=15,retrans=3,rsize=1048576,wsize=1048576,cache=loose,actimeo=10,echo_interval=60,_netdev,noauto,x-systemd.automount,x-systemd.device-timeout=10,x-systemd.mount-timeout=30,noperm 0 0
Key Improvements Explained
Better Timeout Handling
timeo=15- 15-second timeout for RPC calls (prevents 90-second hangs)retrans=3- 3 retry attempts instead of 1x-systemd.device-timeout=10- 10-second systemd device timeoutx-systemd.mount-timeout=30- 30-second mount operation timeout
Graceful Error Recovery
soft- Allows operations to fail instead of hanging indefinitelyintr- Allows kernel to interrupt hung operations (CRITICAL for preventing deadlocks)_netdev- Indicates network dependency for proper boot orderingnoauto,x-systemd.automount- Auto-mount on access, unmount when idle
Preventing Kernel Deadlocks
- Smaller buffer sizes -
rsize=1048576,wsize=1048576(1MB instead of 16MB) reduces memory pressure actimeo=10- Shorter attribute cache timeout (10s vs 30s) for faster error detectionecho_interval=60- Longer keepalive interval reduces network chatter
Network Interruption Resilience
cache=loose- Maintains loose caching for better performance with network issues- Combined timeout strategy - Multiple timeout layers prevent single failure from hanging system
Implementation Steps
Step 1: Backup Current Configuration
sudo cp /etc/fstab /etc/fstab.backup
Step 2: Update /etc/fstab
Replace the current line with the recommended configuration above.
Step 3: Test the New Configuration
# Unmount current mount
sudo umount /mnt/media
# Remount with new options
sudo mount /mnt/media
# Verify new mount options are active
mount | grep /mnt/media
Step 4: Validate Network Resilience
# Test timeout behavior with network simulation
# (Temporarily disconnect NAS network cable for 30 seconds)
# Verify mount operations fail gracefully instead of hanging system
Additional System-Level Protections
1. Network Monitoring Script
Create a monitoring script to detect NAS connectivity issues:
#!/bin/bash
# /mnt/NV2/Development/claude-home/scripts/monitoring/nas-connectivity-monitor.sh
ping -c 1 -W 5 10.10.0.35 || echo "NAS connectivity issue detected"
2. Systemd Service Dependencies
Configure services to gracefully handle mount failures:
# Add to services that depend on /mnt/media
After=mnt-media.mount
Wants=mnt-media.mount
3. Kernel Parameter Tuning
Consider CIFS timeout behavior tuning:
# Add to /etc/sysctl.conf if needed
echo 30 > /sys/module/cifs/parameters/CIFSMaxBufSize
Expected Improvements
After implementing these changes:
Immediate Benefits
- No more 90-second hangs - Operations fail fast with 15-second timeouts
- Graceful error recovery -
intrallows kernel to interrupt hung operations - Reduced memory pressure - Smaller 1MB buffers vs 16MB
- Better retry behavior - 3 attempts with exponential backoff
System Stability
- Prevents kernel deadlocks - Operations can be interrupted and retried
- Faster error detection - 10-second attribute cache timeout
- Automatic recovery - systemd auto-mounting handles reconnection
Performance
- Maintained caching benefits -
cache=loosepreserves performance - Reduced network overhead - 60-second keepalive intervals
- Efficient buffer usage - 1MB buffers balance performance and stability
Files to Modify
/etc/fstab- Primary mount configuration- Optional monitoring scripts - NAS connectivity checks
- Service configurations - Dependencies on mount availability
Testing Checklist
- Backup current fstab configuration
- Apply new mount options
- Test normal operation (read/write files)
- Test network interruption handling (disconnect NAS briefly)
- Verify fast failure instead of system hangs
- Monitor system stability over 24 hours
- Validate with Tdarr container operations
Monitoring and Validation
Success Criteria
- Mount operations fail within 30 seconds during network issues
- No kernel RCU stalls or deadlock messages in journal
- System remains responsive during NAS network problems
- Automatic remount when network connectivity restored
Long-term Monitoring
- Monitor journal for CIFS error patterns
- Track system stability metrics
- Validate performance impact of smaller buffers
- Ensure gaming and transcoding workloads remain unaffected