claude-home/monitoring/examples/cron-job-management.md
Cal Corum 10c9e0d854 CLAUDE: Migrate to technology-first documentation architecture
Complete restructure from patterns/examples/reference to technology-focused directories:

• Created technology-specific directories with comprehensive documentation:
  - /tdarr/ - Transcoding automation with gaming-aware scheduling
  - /docker/ - Container management with GPU acceleration patterns
  - /vm-management/ - Virtual machine automation and cloud-init
  - /networking/ - SSH infrastructure, reverse proxy, and security
  - /monitoring/ - System health checks and Discord notifications
  - /databases/ - Database patterns and troubleshooting
  - /development/ - Programming language patterns (bash, nodejs, python, vuejs)

• Enhanced CLAUDE.md with intelligent context loading:
  - Technology-first loading rules for automatic context provision
  - Troubleshooting keyword triggers for emergency scenarios
  - Documentation maintenance protocols with automated reminders
  - Context window management for optimal documentation updates

• Preserved valuable content from .claude/tmp/:
  - SSH security improvements and server inventory
  - Tdarr CIFS troubleshooting and Docker iptables solutions
  - Operational scripts with proper technology classification

• Benefits achieved:
  - Self-contained technology directories with complete context
  - Automatic loading of relevant documentation based on keywords
  - Emergency-ready troubleshooting with comprehensive guides
  - Scalable structure for future technology additions
  - Eliminated context bloat through targeted loading

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-12 23:20:15 -05:00

8.5 KiB

Cron Job Management Patterns

This document outlines the cron job patterns and management strategies used in the home lab environment.

Current Cron Schedule

Overview

# Monthly maintenance
0 2 1 * * /home/cal/bin/ssh_key_maintenance.sh

# Tdarr monitoring and management
*/10 * * * * python3 /mnt/NV2/Development/claude-home/scripts/tdarr_monitor.py --server http://10.10.0.43:8265 --check nodes --detect-stuck --discord-alerts >/dev/null 2>&1
0 */6 * * * find "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/temp/" -name "tdarr-workDir2-*" -type d -mmin +360 -exec rm -rf {} \; 2>/dev/null || true
0 3 * * * find "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/media" -name "*.temp" -o -name "*.tdarr" -mtime +1 -delete 2>/dev/null || true

# Disabled/legacy jobs
#*/20 * * * * /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh

Job Categories

1. System Maintenance

SSH Key Maintenance

  • Schedule: 0 2 1 * * (Monthly, 1st at 2 AM)
  • Purpose: Maintain SSH key security and rotation
  • Location: /home/cal/bin/ssh_key_maintenance.sh
  • Priority: High (security-critical)

2. Monitoring & Alerting

Tdarr System Monitoring

  • Schedule: */10 * * * * (Every 10 minutes)
  • Purpose: Monitor Tdarr nodes, detect stuck jobs, send Discord alerts
  • Features:
    • Stuck job detection (30-minute threshold)
    • Discord notifications with rich embeds
    • Persistent memory state tracking
  • Script: /mnt/NV2/Development/claude-home/scripts/tdarr_monitor.py
  • Output: Silent (>/dev/null 2>&1)

3. Cleanup & Housekeeping

Tdarr Work Directory Cleanup

  • Schedule: 0 */6 * * * (Every 6 hours)
  • Purpose: Remove stale Tdarr work directories
  • Target: /mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/temp/
  • Pattern: tdarr-workDir2-* directories
  • Age threshold: 6 hours (-mmin +360)

Failed Tdarr Job Cleanup

  • Schedule: 0 3 * * * (Daily at 3 AM)
  • Purpose: Remove failed transcode artifacts
  • Target: /mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/media/
  • Patterns: *.temp and *.tdarr files
  • Age threshold: 24 hours (-mtime +1)

Design Patterns

1. Absolute Paths

Always use absolute paths in cron jobs

# Good
*/10 * * * * python3 /full/path/to/script.py

# Bad - relative paths don't work in cron
*/10 * * * * python3 scripts/script.py

2. Error Handling

Standard error suppression pattern

command 2>/dev/null || true
  • Suppresses stderr to prevent cron emails
  • || true ensures job always exits successfully

3. Time-based Cleanup

Safe age thresholds for different content types

  • Work directories: 6 hours (short-lived, safe for active jobs)
  • Temp files: 24 hours (allows for long transcodes)
  • Log files: 7-30 days (depending on importance)

4. Resource-aware Scheduling

Avoid resource conflicts

# System maintenance at low-usage times
0 2 1 * * maintenance_script.sh

# Cleanup during off-peak hours  
0 3 * * * cleanup_script.sh

# Monitoring with high frequency during active hours
*/10 * * * * monitor_script.py

Management Workflow

Adding New Cron Jobs

  1. Backup current crontab

    crontab -l > /tmp/crontab_backup_$(date +%Y%m%d)
    
  2. Edit safely

    crontab -l > /tmp/new_crontab
    echo "# New job description" >> /tmp/new_crontab
    echo "schedule command" >> /tmp/new_crontab
    crontab /tmp/new_crontab
    
  3. Verify installation

    crontab -l
    

Proper HERE Document (EOF) Usage

When building cron files with HERE documents, use proper EOF formatting:

Correct Format

cat > /tmp/new_crontab << 'EOF'
0 2 1 * * /home/cal/bin/ssh_key_maintenance.sh
# Tdarr monitoring every 10 minutes
*/10 * * * * python3 /path/to/script.py --args
EOF

Common Mistakes

# BAD - Causes "EOF not found" errors
cat >> /tmp/crontab << 'EOF'
new_cron_job
EOF

# Results in malformed file with literal "EOF < /dev/null" lines

Key Rules for EOF in Cron Files

  1. Use cat > not cat >> for building complete files

    # Good - overwrites file cleanly
    cat > /tmp/crontab << 'EOF'
    
    # Bad - appends and can create malformed files
    cat >> /tmp/crontab << 'EOF'
    
  2. Quote the EOF delimiter to prevent variable expansion

    # Good - literal content
    cat > file << 'EOF'
    
    # Can cause issues with special characters
    cat > file << EOF
    
  3. Clean up malformed files before installing

    # Remove EOF artifacts and empty lines
    head -n -1 /tmp/crontab > /tmp/clean_crontab
    
    # Or use grep to remove EOF lines
    grep -v "^EOF" /tmp/crontab > /tmp/clean_crontab
    
  4. Alternative approach - direct echo method

    crontab -l > /tmp/current_crontab
    echo "# New job comment" >> /tmp/current_crontab
    echo "*/10 * * * * /path/to/command" >> /tmp/current_crontab
    crontab /tmp/current_crontab
    

Debugging EOF Issues

# Check for EOF artifacts in crontab file
cat -n /tmp/crontab | grep EOF

# Validate crontab syntax before installing
crontab -T /tmp/crontab  # Some systems support this

# Manual cleanup if needed
sed '/^EOF/d' /tmp/crontab > /tmp/clean_crontab

Testing Cron Jobs

Test command syntax first

# Test the actual command before scheduling
python3 /full/path/to/script.py --test

# Check file permissions
ls -la /path/to/script

# Verify paths exist
ls -la /target/directory/

Test with minimal frequency

# Start with 5-minute intervals for testing
*/5 * * * * /path/to/new/script.sh

# Monitor logs
tail -f /var/log/syslog | grep CRON

Monitoring Cron Jobs

Check cron logs

# System cron logs
sudo journalctl -u cron -f

# User cron logs  
grep CRON /var/log/syslog | grep $(whoami)

Verify job execution

# Check if cleanup actually ran
ls -la /target/cleanup/directory/

# Monitor script logs
tail -f /path/to/script/logs/

Security Considerations

1. Path Security

  • Use absolute paths to prevent PATH manipulation
  • Ensure scripts are owned by correct user
  • Set appropriate permissions (750 for scripts)

2. Command Injection Prevention

# Good - quoted paths
find "/path/with spaces/" -name "pattern"

# Bad - unquoted paths vulnerable to injection
find /path/with spaces/ -name pattern

3. Resource Limits

  • Prevent runaway processes with timeout
  • Use ionice for I/O intensive cleanup jobs
  • Consider nice for CPU-intensive tasks

Troubleshooting

Common Issues

Job not running

  1. Check cron service: sudo systemctl status cron
  2. Verify crontab syntax: crontab -l
  3. Check file permissions and paths
  4. Review cron logs for errors

Environment differences

  • Cron runs with minimal environment
  • Set PATH explicitly if needed
  • Use absolute paths for all commands

Silent failures

  • Remove 2>/dev/null temporarily for debugging
  • Add logging to scripts
  • Check script exit codes

Debugging Commands

# Test cron environment
* * * * * env > /tmp/cron_env.txt

# Test script in cron-like environment
env -i /bin/bash -c 'your_command_here'

# Monitor real-time execution
sudo tail -f /var/log/syslog | grep CRON

Best Practices

1. Documentation

  • Comment all cron jobs with purpose and schedule
  • Document in this patterns file
  • Include contact info for complex jobs

2. Maintenance

  • Regular review of active jobs (quarterly)
  • Remove obsolete jobs promptly
  • Update absolute paths when moving scripts

3. Monitoring

  • Implement health checks for critical jobs
  • Use Discord/email notifications for failures
  • Monitor disk space usage from cleanup jobs

4. Backup Strategy

  • Backup crontab before changes
  • Version control cron configurations
  • Document restoration procedures

Future Enhancements

Planned Additions

  • Log rotation: Automated cleanup of application logs
  • Health checks: System resource monitoring
  • Backup verification: Automated backup integrity checks
  • Certificate renewal: SSL/TLS certificate automation

Migration Considerations

  • Systemd timers: Consider migration for complex scheduling
  • Configuration management: Ansible or similar for multi-host
  • Centralized logging: Aggregated cron job monitoring