claude-home/monitoring/examples/cron-job-management.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

334 lines
8.8 KiB
Markdown

---
title: "Cron Job Management Patterns"
description: "Cron job patterns for homelab monitoring including current schedules, HERE document best practices, resource-aware scheduling, security considerations, and debugging techniques for cron environment issues."
type: reference
domain: monitoring
tags: [cron, scheduling, tdarr, bash, automation, maintenance]
---
# Cron Job Management Patterns
This document outlines the cron job patterns and management strategies used in the home lab environment.
## Current Cron Schedule
### Overview
```bash
# Monthly maintenance
0 2 1 * * /home/cal/bin/ssh_key_maintenance.sh
# Tdarr monitoring and management
*/10 * * * * python3 /mnt/NV2/Development/claude-home/scripts/tdarr_monitor.py --server http://10.10.0.43:8265 --check nodes --detect-stuck --discord-alerts >/dev/null 2>&1
0 */6 * * * find "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/temp/" -name "tdarr-workDir2-*" -type d -mmin +360 -exec rm -rf {} \; 2>/dev/null || true
0 3 * * * find "/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/media" -name "*.temp" -o -name "*.tdarr" -mtime +1 -delete 2>/dev/null || true
# Disabled/legacy jobs
#*/20 * * * * /mnt/NV2/Development/claude-home/scripts/monitoring/tdarr-timeout-monitor.sh
```
## Job Categories
### 1. System Maintenance
**SSH Key Maintenance**
- **Schedule**: `0 2 1 * *` (Monthly, 1st at 2 AM)
- **Purpose**: Maintain SSH key security and rotation
- **Location**: `/home/cal/bin/ssh_key_maintenance.sh`
- **Priority**: High (security-critical)
### 2. Monitoring & Alerting
**Tdarr System Monitoring**
- **Schedule**: `*/10 * * * *` (Every 10 minutes)
- **Purpose**: Monitor Tdarr nodes, detect stuck jobs, send Discord alerts
- **Features**:
- Stuck job detection (30-minute threshold)
- Discord notifications with rich embeds
- Persistent memory state tracking
- **Script**: `/mnt/NV2/Development/claude-home/scripts/tdarr_monitor.py`
- **Output**: Silent (`>/dev/null 2>&1`)
### 3. Cleanup & Housekeeping
**Tdarr Work Directory Cleanup**
- **Schedule**: `0 */6 * * *` (Every 6 hours)
- **Purpose**: Remove stale Tdarr work directories
- **Target**: `/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/temp/`
- **Pattern**: `tdarr-workDir2-*` directories
- **Age threshold**: 6 hours (`-mmin +360`)
**Failed Tdarr Job Cleanup**
- **Schedule**: `0 3 * * *` (Daily at 3 AM)
- **Purpose**: Remove failed transcode artifacts
- **Target**: `/mnt/NV2/tdarr-cache/nobara-pc-gpu-unmapped/media/`
- **Patterns**: `*.temp` and `*.tdarr` files
- **Age threshold**: 24 hours (`-mtime +1`)
## Design Patterns
### 1. Absolute Paths
**Always use absolute paths in cron jobs**
```bash
# Good
*/10 * * * * python3 /full/path/to/script.py
# Bad - relative paths don't work in cron
*/10 * * * * python3 scripts/script.py
```
### 2. Error Handling
**Standard error suppression pattern**
```bash
command 2>/dev/null || true
```
- Suppresses stderr to prevent cron emails
- `|| true` ensures job always exits successfully
### 3. Time-based Cleanup
**Safe age thresholds for different content types**
- **Work directories**: 6 hours (short-lived, safe for active jobs)
- **Temp files**: 24 hours (allows for long transcodes)
- **Log files**: 7-30 days (depending on importance)
### 4. Resource-aware Scheduling
**Avoid resource conflicts**
```bash
# System maintenance at low-usage times
0 2 1 * * maintenance_script.sh
# Cleanup during off-peak hours
0 3 * * * cleanup_script.sh
# Monitoring with high frequency during active hours
*/10 * * * * monitor_script.py
```
## Management Workflow
### Adding New Cron Jobs
1. **Backup current crontab**
```bash
crontab -l > /tmp/crontab_backup_$(date +%Y%m%d)
```
2. **Edit safely**
```bash
crontab -l > /tmp/new_crontab
echo "# New job description" >> /tmp/new_crontab
echo "schedule command" >> /tmp/new_crontab
crontab /tmp/new_crontab
```
3. **Verify installation**
```bash
crontab -l
```
### Proper HERE Document (EOF) Usage
**When building cron files with HERE documents, use proper EOF formatting:**
#### ✅ **Correct Format**
```bash
cat > /tmp/new_crontab << 'EOF'
0 2 1 * * /home/cal/bin/ssh_key_maintenance.sh
# Tdarr monitoring every 10 minutes
*/10 * * * * python3 /path/to/script.py --args
EOF
```
#### ❌ **Common Mistakes**
```bash
# BAD - Causes "EOF not found" errors
cat >> /tmp/crontab << 'EOF'
new_cron_job
EOF
# Results in malformed file with literal "EOF < /dev/null" lines
```
#### **Key Rules for EOF in Cron Files**
1. **Use `cat >` not `cat >>`** for building complete files
```bash
# Good - overwrites file cleanly
cat > /tmp/crontab << 'EOF'
# Bad - appends and can create malformed files
cat >> /tmp/crontab << 'EOF'
```
2. **Quote the EOF delimiter** to prevent variable expansion
```bash
# Good - literal content
cat > file << 'EOF'
# Can cause issues with special characters
cat > file << EOF
```
3. **Clean up malformed files** before installing
```bash
# Remove EOF artifacts and empty lines
head -n -1 /tmp/crontab > /tmp/clean_crontab
# Or use grep to remove EOF lines
grep -v "^EOF" /tmp/crontab > /tmp/clean_crontab
```
4. **Alternative approach - direct echo method**
```bash
crontab -l > /tmp/current_crontab
echo "# New job comment" >> /tmp/current_crontab
echo "*/10 * * * * /path/to/command" >> /tmp/current_crontab
crontab /tmp/current_crontab
```
#### **Debugging EOF Issues**
```bash
# Check for EOF artifacts in crontab file
cat -n /tmp/crontab | grep EOF
# Validate crontab syntax before installing
crontab -T /tmp/crontab # Some systems support this
# Manual cleanup if needed
sed '/^EOF/d' /tmp/crontab > /tmp/clean_crontab
```
### Testing Cron Jobs
**Test command syntax first**
```bash
# Test the actual command before scheduling
python3 /full/path/to/script.py --test
# Check file permissions
ls -la /path/to/script
# Verify paths exist
ls -la /target/directory/
```
**Test with minimal frequency**
```bash
# Start with 5-minute intervals for testing
*/5 * * * * /path/to/new/script.sh
# Monitor logs
tail -f /var/log/syslog | grep CRON
```
### Monitoring Cron Jobs
**Check cron logs**
```bash
# System cron logs
sudo journalctl -u cron -f
# User cron logs
grep CRON /var/log/syslog | grep $(whoami)
```
**Verify job execution**
```bash
# Check if cleanup actually ran
ls -la /target/cleanup/directory/
# Monitor script logs
tail -f /path/to/script/logs/
```
## Security Considerations
### 1. Path Security
- Use absolute paths to prevent PATH manipulation
- Ensure scripts are owned by correct user
- Set appropriate permissions (750 for scripts)
### 2. Command Injection Prevention
```bash
# Good - quoted paths
find "/path/with spaces/" -name "pattern"
# Bad - unquoted paths vulnerable to injection
find /path/with spaces/ -name pattern
```
### 3. Resource Limits
- Prevent runaway processes with `timeout`
- Use `ionice` for I/O intensive cleanup jobs
- Consider `nice` for CPU-intensive tasks
## Troubleshooting
### Common Issues
**Job not running**
1. Check cron service: `sudo systemctl status cron`
2. Verify crontab syntax: `crontab -l`
3. Check file permissions and paths
4. Review cron logs for errors
**Environment differences**
- Cron runs with minimal environment
- Set PATH explicitly if needed
- Use absolute paths for all commands
**Silent failures**
- Remove `2>/dev/null` temporarily for debugging
- Add logging to scripts
- Check script exit codes
### Debugging Commands
```bash
# Test cron environment
* * * * * env > /tmp/cron_env.txt
# Test script in cron-like environment
env -i /bin/bash -c 'your_command_here'
# Monitor real-time execution
sudo tail -f /var/log/syslog | grep CRON
```
## Best Practices
### 1. Documentation
- Comment all cron jobs with purpose and schedule
- Document in this patterns file
- Include contact info for complex jobs
### 2. Maintenance
- Regular review of active jobs (quarterly)
- Remove obsolete jobs promptly
- Update absolute paths when moving scripts
### 3. Monitoring
- Implement health checks for critical jobs
- Use Discord/email notifications for failures
- Monitor disk space usage from cleanup jobs
### 4. Backup Strategy
- Backup crontab before changes
- Version control cron configurations
- Document restoration procedures
## Future Enhancements
### Planned Additions
- **Log rotation**: Automated cleanup of application logs
- **Health checks**: System resource monitoring
- **Backup verification**: Automated backup integrity checks
- **Certificate renewal**: SSL/TLS certificate automation
### Migration Considerations
- **Systemd timers**: Consider migration for complex scheduling
- **Configuration management**: Ansible or similar for multi-host
- **Centralized logging**: Aggregated cron job monitoring
---
## Related Documentation
- [Tdarr Monitoring Script](../scripts/README.md#tdarr_monitorpy---enhanced-tdarr-monitoring)
- [System Maintenance](../reference/system-maintenance.md)
- [Discord Integration](../examples/discord-notifications.md)