Claude Discord Bot e6983b56b9 Week 3 progress: Logging and testing complete (MED-001, MED-002, MED-005)

MED-001: Enhanced typing indicator
- Persistent typing loop (_maintain_typing method)
- Loops every 8s to maintain indicator for long operations (30s-5min)
- 8 comprehensive tests covering all lifecycle scenarios
- 27/27 bot tests passing

MED-002: Structured logging and error reporting
- logging_config.py (371 lines) - JSONFormatter, ErrorTracker, format_error_for_discord
- RotatingFileHandler (10MB max, 5 backups)
- Unique 8-char error IDs for support tracking
- Privacy-safe Discord error messages (7 error types)
- Enhanced bot.py with structured logging throughout
- 15/15 logging tests passing

MED-005: Comprehensive test suite
- Total: 156/157 tests passing (99.4%)
- test_session_manager.py: 27 tests
- test_claude_runner.py: 11 tests
- test_config.py: 25 tests
- test_response_formatter.py: 26 tests
- test_bot.py: 27 tests
- test_commands.py: 18 tests
- test_concurrency.py: 7 tests
- test_logging.py: 15 tests

Total: 13/18 tasks complete (72.2%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-13 19:12:04 +00:00

10 KiB

Raw Permalink Blame History

Logging and Error Reporting

This document describes the comprehensive logging and error reporting system implemented for Claude Discord Coordinator.

Overview

The bot uses structured JSON logging with rotation for production-grade observability and debugging. All errors are tracked with unique IDs and user-friendly messages are displayed in Discord without exposing internal implementation details.

Components

1. Structured Logging (`logging_config.py`)

JSONFormatter

Outputs log records as JSON objects with consistent fields:

{
    "timestamp": "2026-02-13T19:06:35.545176+00:00",
    "level": "INFO",
    "module": "bot",
    "function": "_handle_claude_request",
    "line": 123,
    "message": "Processing message in channel 12345 for project my-project",
    "channel_id": "12345",
    "session_id": "abc-def-ghi",
    "cost_usd": 0.05
}

Fields:

timestamp: ISO 8601 UTC timestamp
level: Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
module: Python module name
function: Function name
line: Source line number
message: Log message
exception: Full traceback (if exception occurred)
Custom fields via extra={} parameter

ErrorTracker

Generates unique error IDs for tracking and support:

from claude_coordinator.logging_config import ErrorTracker

error_id = ErrorTracker.generate_error_id()  # Returns 8-char ID like "abc12345"

Use log_error_with_id() to log with automatic error ID:

error_id = ErrorTracker.log_error_with_id(
    logger,
    logging.ERROR,
    "Failed to process request",
    exc_info=exception,
    channel_id="12345",
    session_id="abc-def"
)

setup_logging()

Configures application-wide logging with rotation:

from claude_coordinator.logging_config import setup_logging, get_log_directory

log_dir = get_log_directory()
setup_logging(
    log_level="INFO",
    log_file=str(log_dir / "bot.log"),
    json_logs=True,
    max_bytes=10 * 1024 * 1024,  # 10MB
    backup_count=5
)

Parameters:

log_level: DEBUG, INFO, WARNING, ERROR, CRITICAL
log_file: Path to log file (creates directory if needed)
json_logs: Use JSON formatter (True) or plain text (False)
max_bytes: Max file size before rotation (default: 10MB)
backup_count: Number of rotated files to keep (default: 5)

2. Log File Locations

Determined by get_log_directory():

Production: /var/log/claude-coordinator/ (requires write permissions)
Fallback: ~/.claude-coordinator/logs/

Log files:

bot.log - Current log file
bot.log.1 - First rotated backup
bot.log.2 - Second rotated backup
etc. (up to backup_count)

3. Discord Error Messages

User-friendly error messages without internal details:

from claude_coordinator.logging_config import format_error_for_discord

discord_msg = format_error_for_discord(
    error_type="timeout",
    error_id="abc12345"
)

Supported Error Types:

Error Type	Message	Use Case
`timeout`	⏱️ Request Timeout	Claude took too long
`claude_error`	❌ Claude Error	Claude command failed
`parse_error`	⚠️ Parse Error	Malformed JSON response
`config_error`	⚙️ Configuration Error	Channel not configured
`permission_error`	🔒 Permission Error	Insufficient permissions
`session_error`	💾 Session Error	Session save/load failure
`lock_error`	🔄 Processing Error	Lock acquisition failed

Example Output:

⏱️ **Request Timeout**
Claude took longer than expected. Please try again.
If this continues, try a simpler request.

_Error ID: `abc12345`_
_Please reference this ID if requesting support._

4. Combined Error Logging and Formatting

Convenience function for common pattern:

from claude_coordinator.logging_config import log_and_format_error

error_id, discord_msg = log_and_format_error(
    logger,
    error_type="timeout",
    message="Request timeout in channel",
    channel_id="12345",
    duration=300
)

# Logs to structured logs with error_id and context
# Returns tuple: (error_id, discord_friendly_message)

await channel.send(discord_msg)

Key Events Logged

Bot Lifecycle

Startup: Version, config path, loaded projects
Ready: Connected guilds, channels, bot user info
Shutdown: Graceful vs crash, cleanup status

Message Processing

Message Received: Channel, user, message length, project
Project Matched: Channel → project name mapping
Session State: New vs resumed, session_id
Claude Command: Full command for debugging (via extra fields)
Claude Subprocess: Start, completion, duration, cost
Response Sent: Chunk count, total chars, delivery status

Errors (with Context)

All errors include:

Error ID for support reference
Channel ID and session ID
Relevant context (duration, exit code, etc.)
Full exception traceback (in logs only, not Discord)

Performance Metrics

Request Duration: Total time from message to response
Claude CLI Duration: Subprocess execution time
Session Operations: Database query times
Message Queue Depth: Concurrent requests per channel

Usage Examples

Basic Logging

import logging

logger = logging.getLogger(__name__)

# Simple log
logger.info("Processing message")

# Log with context
logger.info(
    "Processing message in channel",
    extra={
        'channel_id': '12345',
        'user_id': '67890',
        'message_length': 150
    }
)

Error Handling with Tracking

try:
    result = await some_operation()
except TimeoutError as e:
    error_id, discord_msg = log_and_format_error(
        logger,
        error_type="timeout",
        message="Operation timeout",
        error=e,
        channel_id=channel_id,
        duration=300
    )
    await channel.send(discord_msg)

Performance Tracking

import time

start_time = time.time()
result = await long_operation()
duration_ms = int((time.time() - start_time) * 1000)

logger.info(
    "Operation completed",
    extra={
        'operation': 'claude_cli',
        'duration_ms': duration_ms,
        'result_size': len(result)
    }
)

Environment Variables

LOG_LEVEL: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Default: INFO

# Development with debug logging
export LOG_LEVEL=DEBUG
python -m claude_coordinator.bot

# Production with info logging
export LOG_LEVEL=INFO
python -m claude_coordinator.bot

Log Analysis

Querying JSON Logs

Use jq to query structured logs:

# Get all errors
cat bot.log | jq 'select(.level == "ERROR")'

# Get errors for specific channel
cat bot.log | jq 'select(.channel_id == "12345" and .level == "ERROR")'

# Calculate average Claude CLI duration
cat bot.log | jq 'select(.claude_duration_ms) | .claude_duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'

# Find all timeouts
cat bot.log | jq 'select(.message | contains("timeout"))'

# Get error IDs from today
cat bot.log | jq 'select(.error_id) | {timestamp, error_id, message}'

Finding Errors by ID

# Find specific error ID in logs
cat bot.log* | jq 'select(.error_id == "abc12345")'

# Get full context for error
cat bot.log* | jq 'select(.error_id == "abc12345") | .'

Performance Analysis

# Get slowest Claude CLI calls
cat bot.log | jq 'select(.claude_duration_ms) | {duration_ms: .claude_duration_ms, channel_id, session_id}' | jq -s 'sort_by(.duration_ms) | reverse | .[0:10]'

# Cost tracking
cat bot.log | jq 'select(.cost_usd) | .cost_usd' | awk '{sum+=$1} END {print "Total: $" sum}'

Testing

Comprehensive test suite in tests/test_logging.py:

# Run logging tests
pytest tests/test_logging.py -v

# Manual testing
python test_logging_manual.py

Test Coverage:

JSON formatter validation
Required fields presence
Exception handling
Extra fields inclusion
Error ID generation and uniqueness
Discord message formatting
Privacy protection (no internal details exposed)
Log rotation configuration

Privacy and Security

What Gets Logged

✓ Safe to log:

Channel IDs (public Discord identifiers)
Session IDs (UUIDs)
Timestamps, durations, costs
Error messages and stack traces
Operation types and results

✗ Never log:

User message content (only length)
Discord tokens
Authentication credentials
Personal identifiable information (PII)

What Gets Shown in Discord

✓ User-friendly messages:

Error type and description
Error ID for support reference
Actionable suggestions

✗ Never exposed:

Stack traces
File paths
Internal error details
Session IDs
Database information

Monitoring Recommendations

Real-time Monitoring

# Follow logs in real-time
tail -f ~/.claude-coordinator/logs/bot.log | jq .

# Follow errors only
tail -f ~/.claude-coordinator/logs/bot.log | jq 'select(.level == "ERROR")'

Alerting

Monitor for:

High error rates (>5 errors/minute)
Repeated timeouts (same channel)
Session save failures
Database errors
Cost spikes (>$1/hour)

Log Retention

Active log: bot.log (up to 10MB)
Rotated logs: bot.log.1 through bot.log.5
Total retention: ~50MB (5 rotations × 10MB)
Time retention: Depends on activity (typically 1-7 days)

For longer retention, implement external log aggregation (e.g., rsyslog, Loki, CloudWatch).

Troubleshooting

Logs Not Appearing

Check log file path:

python -c "from claude_coordinator.logging_config import get_log_directory; print(get_log_directory())"

Verify write permissions:

mkdir -p ~/.claude-coordinator/logs
touch ~/.claude-coordinator/logs/test.log

Check log level:
```
export LOG_LEVEL=DEBUG
```

Invalid JSON in Logs

If JSON parsing fails, check for:

Multi-line log messages (should be escaped)
Special characters in message strings
Corrupted log rotation

Log Rotation Not Working

Verify configuration:

setup_logging(
    max_bytes=10 * 1024 * 1024,  # Must be positive integer
    backup_count=5  # Must be > 0 for rotation
)

Future Enhancements

Potential improvements:

Integrate with external logging service (Loki, CloudWatch)
Add log aggregation dashboard
Implement log compression for rotated files
Add performance profiling hooks
Create alerting rules for critical errors
Add cost tracking and budget alerts

Last Updated: 2026-02-13
Version: MED-002 Implementation

10 KiB Raw Permalink Blame History Unescape Escape