MED-001: Enhanced typing indicator - Persistent typing loop (_maintain_typing method) - Loops every 8s to maintain indicator for long operations (30s-5min) - 8 comprehensive tests covering all lifecycle scenarios - 27/27 bot tests passing MED-002: Structured logging and error reporting - logging_config.py (371 lines) - JSONFormatter, ErrorTracker, format_error_for_discord - RotatingFileHandler (10MB max, 5 backups) - Unique 8-char error IDs for support tracking - Privacy-safe Discord error messages (7 error types) - Enhanced bot.py with structured logging throughout - 15/15 logging tests passing MED-005: Comprehensive test suite - Total: 156/157 tests passing (99.4%) - test_session_manager.py: 27 tests - test_claude_runner.py: 11 tests - test_config.py: 25 tests - test_response_formatter.py: 26 tests - test_bot.py: 27 tests - test_commands.py: 18 tests - test_concurrency.py: 7 tests - test_logging.py: 15 tests Total: 13/18 tasks complete (72.2%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 KiB
Logging and Error Reporting
This document describes the comprehensive logging and error reporting system implemented for Claude Discord Coordinator.
Overview
The bot uses structured JSON logging with rotation for production-grade observability and debugging. All errors are tracked with unique IDs and user-friendly messages are displayed in Discord without exposing internal implementation details.
Components
1. Structured Logging (logging_config.py)
JSONFormatter
Outputs log records as JSON objects with consistent fields:
{
"timestamp": "2026-02-13T19:06:35.545176+00:00",
"level": "INFO",
"module": "bot",
"function": "_handle_claude_request",
"line": 123,
"message": "Processing message in channel 12345 for project my-project",
"channel_id": "12345",
"session_id": "abc-def-ghi",
"cost_usd": 0.05
}
Fields:
timestamp: ISO 8601 UTC timestamplevel: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)module: Python module namefunction: Function nameline: Source line numbermessage: Log messageexception: Full traceback (if exception occurred)- Custom fields via
extra={}parameter
ErrorTracker
Generates unique error IDs for tracking and support:
from claude_coordinator.logging_config import ErrorTracker
error_id = ErrorTracker.generate_error_id() # Returns 8-char ID like "abc12345"
Use log_error_with_id() to log with automatic error ID:
error_id = ErrorTracker.log_error_with_id(
logger,
logging.ERROR,
"Failed to process request",
exc_info=exception,
channel_id="12345",
session_id="abc-def"
)
setup_logging()
Configures application-wide logging with rotation:
from claude_coordinator.logging_config import setup_logging, get_log_directory
log_dir = get_log_directory()
setup_logging(
log_level="INFO",
log_file=str(log_dir / "bot.log"),
json_logs=True,
max_bytes=10 * 1024 * 1024, # 10MB
backup_count=5
)
Parameters:
log_level: DEBUG, INFO, WARNING, ERROR, CRITICALlog_file: Path to log file (creates directory if needed)json_logs: Use JSON formatter (True) or plain text (False)max_bytes: Max file size before rotation (default: 10MB)backup_count: Number of rotated files to keep (default: 5)
2. Log File Locations
Determined by get_log_directory():
- Production:
/var/log/claude-coordinator/(requires write permissions) - Fallback:
~/.claude-coordinator/logs/
Log files:
bot.log- Current log filebot.log.1- First rotated backupbot.log.2- Second rotated backup- etc. (up to
backup_count)
3. Discord Error Messages
User-friendly error messages without internal details:
from claude_coordinator.logging_config import format_error_for_discord
discord_msg = format_error_for_discord(
error_type="timeout",
error_id="abc12345"
)
Supported Error Types:
| Error Type | Message | Use Case |
|---|---|---|
timeout |
⏱️ Request Timeout | Claude took too long |
claude_error |
❌ Claude Error | Claude command failed |
parse_error |
⚠️ Parse Error | Malformed JSON response |
config_error |
⚙️ Configuration Error | Channel not configured |
permission_error |
🔒 Permission Error | Insufficient permissions |
session_error |
💾 Session Error | Session save/load failure |
lock_error |
🔄 Processing Error | Lock acquisition failed |
Example Output:
⏱️ **Request Timeout**
Claude took longer than expected. Please try again.
If this continues, try a simpler request.
_Error ID: `abc12345`_
_Please reference this ID if requesting support._
4. Combined Error Logging and Formatting
Convenience function for common pattern:
from claude_coordinator.logging_config import log_and_format_error
error_id, discord_msg = log_and_format_error(
logger,
error_type="timeout",
message="Request timeout in channel",
channel_id="12345",
duration=300
)
# Logs to structured logs with error_id and context
# Returns tuple: (error_id, discord_friendly_message)
await channel.send(discord_msg)
Key Events Logged
Bot Lifecycle
- Startup: Version, config path, loaded projects
- Ready: Connected guilds, channels, bot user info
- Shutdown: Graceful vs crash, cleanup status
Message Processing
- Message Received: Channel, user, message length, project
- Project Matched: Channel → project name mapping
- Session State: New vs resumed, session_id
- Claude Command: Full command for debugging (via extra fields)
- Claude Subprocess: Start, completion, duration, cost
- Response Sent: Chunk count, total chars, delivery status
Errors (with Context)
All errors include:
- Error ID for support reference
- Channel ID and session ID
- Relevant context (duration, exit code, etc.)
- Full exception traceback (in logs only, not Discord)
Performance Metrics
- Request Duration: Total time from message to response
- Claude CLI Duration: Subprocess execution time
- Session Operations: Database query times
- Message Queue Depth: Concurrent requests per channel
Usage Examples
Basic Logging
import logging
logger = logging.getLogger(__name__)
# Simple log
logger.info("Processing message")
# Log with context
logger.info(
"Processing message in channel",
extra={
'channel_id': '12345',
'user_id': '67890',
'message_length': 150
}
)
Error Handling with Tracking
try:
result = await some_operation()
except TimeoutError as e:
error_id, discord_msg = log_and_format_error(
logger,
error_type="timeout",
message="Operation timeout",
error=e,
channel_id=channel_id,
duration=300
)
await channel.send(discord_msg)
Performance Tracking
import time
start_time = time.time()
result = await long_operation()
duration_ms = int((time.time() - start_time) * 1000)
logger.info(
"Operation completed",
extra={
'operation': 'claude_cli',
'duration_ms': duration_ms,
'result_size': len(result)
}
)
Environment Variables
LOG_LEVEL: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)- Default: INFO
# Development with debug logging
export LOG_LEVEL=DEBUG
python -m claude_coordinator.bot
# Production with info logging
export LOG_LEVEL=INFO
python -m claude_coordinator.bot
Log Analysis
Querying JSON Logs
Use jq to query structured logs:
# Get all errors
cat bot.log | jq 'select(.level == "ERROR")'
# Get errors for specific channel
cat bot.log | jq 'select(.channel_id == "12345" and .level == "ERROR")'
# Calculate average Claude CLI duration
cat bot.log | jq 'select(.claude_duration_ms) | .claude_duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'
# Find all timeouts
cat bot.log | jq 'select(.message | contains("timeout"))'
# Get error IDs from today
cat bot.log | jq 'select(.error_id) | {timestamp, error_id, message}'
Finding Errors by ID
# Find specific error ID in logs
cat bot.log* | jq 'select(.error_id == "abc12345")'
# Get full context for error
cat bot.log* | jq 'select(.error_id == "abc12345") | .'
Performance Analysis
# Get slowest Claude CLI calls
cat bot.log | jq 'select(.claude_duration_ms) | {duration_ms: .claude_duration_ms, channel_id, session_id}' | jq -s 'sort_by(.duration_ms) | reverse | .[0:10]'
# Cost tracking
cat bot.log | jq 'select(.cost_usd) | .cost_usd' | awk '{sum+=$1} END {print "Total: $" sum}'
Testing
Comprehensive test suite in tests/test_logging.py:
# Run logging tests
pytest tests/test_logging.py -v
# Manual testing
python test_logging_manual.py
Test Coverage:
- JSON formatter validation
- Required fields presence
- Exception handling
- Extra fields inclusion
- Error ID generation and uniqueness
- Discord message formatting
- Privacy protection (no internal details exposed)
- Log rotation configuration
Privacy and Security
What Gets Logged
✓ Safe to log:
- Channel IDs (public Discord identifiers)
- Session IDs (UUIDs)
- Timestamps, durations, costs
- Error messages and stack traces
- Operation types and results
✗ Never log:
- User message content (only length)
- Discord tokens
- Authentication credentials
- Personal identifiable information (PII)
What Gets Shown in Discord
✓ User-friendly messages:
- Error type and description
- Error ID for support reference
- Actionable suggestions
✗ Never exposed:
- Stack traces
- File paths
- Internal error details
- Session IDs
- Database information
Monitoring Recommendations
Real-time Monitoring
# Follow logs in real-time
tail -f ~/.claude-coordinator/logs/bot.log | jq .
# Follow errors only
tail -f ~/.claude-coordinator/logs/bot.log | jq 'select(.level == "ERROR")'
Alerting
Monitor for:
- High error rates (>5 errors/minute)
- Repeated timeouts (same channel)
- Session save failures
- Database errors
- Cost spikes (>$1/hour)
Log Retention
- Active log:
bot.log(up to 10MB) - Rotated logs:
bot.log.1throughbot.log.5 - Total retention: ~50MB (5 rotations × 10MB)
- Time retention: Depends on activity (typically 1-7 days)
For longer retention, implement external log aggregation (e.g., rsyslog, Loki, CloudWatch).
Troubleshooting
Logs Not Appearing
-
Check log file path:
python -c "from claude_coordinator.logging_config import get_log_directory; print(get_log_directory())" -
Verify write permissions:
mkdir -p ~/.claude-coordinator/logs touch ~/.claude-coordinator/logs/test.log -
Check log level:
export LOG_LEVEL=DEBUG
Invalid JSON in Logs
If JSON parsing fails, check for:
- Multi-line log messages (should be escaped)
- Special characters in message strings
- Corrupted log rotation
Log Rotation Not Working
Verify configuration:
setup_logging(
max_bytes=10 * 1024 * 1024, # Must be positive integer
backup_count=5 # Must be > 0 for rotation
)
Future Enhancements
Potential improvements:
- Integrate with external logging service (Loki, CloudWatch)
- Add log aggregation dashboard
- Implement log compression for rotated files
- Add performance profiling hooks
- Create alerting rules for critical errors
- Add cost tracking and budget alerts
Last Updated: 2026-02-13
Version: MED-002 Implementation