ai-assistant-discord-bot/MED-002_IMPLEMENTATION.md
Claude Discord Bot e6983b56b9 Week 3 progress: Logging and testing complete (MED-001, MED-002, MED-005)
MED-001: Enhanced typing indicator
- Persistent typing loop (_maintain_typing method)
- Loops every 8s to maintain indicator for long operations (30s-5min)
- 8 comprehensive tests covering all lifecycle scenarios
- 27/27 bot tests passing

MED-002: Structured logging and error reporting
- logging_config.py (371 lines) - JSONFormatter, ErrorTracker, format_error_for_discord
- RotatingFileHandler (10MB max, 5 backups)
- Unique 8-char error IDs for support tracking
- Privacy-safe Discord error messages (7 error types)
- Enhanced bot.py with structured logging throughout
- 15/15 logging tests passing

MED-005: Comprehensive test suite
- Total: 156/157 tests passing (99.4%)
- test_session_manager.py: 27 tests
- test_claude_runner.py: 11 tests
- test_config.py: 25 tests
- test_response_formatter.py: 26 tests
- test_bot.py: 27 tests
- test_commands.py: 18 tests
- test_concurrency.py: 7 tests
- test_logging.py: 15 tests

Total: 13/18 tasks complete (72.2%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:12:04 +00:00

10 KiB
Raw Permalink Blame History

MED-002: Logging and Error Reporting Implementation

Status: COMPLETE
Date: 2026-02-13
Priority: Medium
Effort: Medium (MED-002)

Summary

Implemented comprehensive structured logging with JSON formatting, log rotation, error tracking with unique IDs, and user-friendly Discord error messages. All 156 tests pass (15 new tests added).

Changes Made

1. New File: claude_coordinator/logging_config.py (371 lines)

Features:

  • JSONFormatter: Structured JSON log output with timestamp, level, module, function, line, message, exception, and custom fields
  • ErrorTracker: Unique error ID generation and tracking
  • setup_logging(): Application-wide logging configuration with rotation
  • get_log_directory(): Smart log directory detection (production vs fallback)
  • format_error_for_discord(): User-friendly error messages without internal details
  • log_and_format_error(): Combined logging and message formatting

Error Types Supported:

  • timeout: Claude request timeout
  • claude_error: Claude CLI failure
  • parse_error: Malformed JSON response
  • config_error: Configuration issues
  • permission_error: Permission denied
  • session_error: Session management failure
  • lock_error: Lock acquisition failure

2. Updated: claude_coordinator/bot.py

Enhancements:

  • Imported and integrated logging_config module
  • Enhanced main() to use setup_logging() with rotation
  • Added structured logging throughout message handling
  • Integrated error tracking with unique error IDs
  • Enhanced error messages with log_and_format_error()
  • Added performance metrics (request duration, Claude duration, session operations)
  • Added context to all log calls (channel_id, session_id, cost, duration, etc.)

Key Events Logged:

  • Bot lifecycle (startup, ready, shutdown)
  • Message processing (received, matched, session state)
  • Claude operations (command, subprocess, response)
  • Error handling (with error IDs and context)
  • Performance metrics (durations, costs, chunk counts)

3. New File: tests/test_logging.py (316 lines)

Test Coverage (15 tests):

  • JSON formatter produces valid JSON
  • Required fields in all log entries
  • Exception info inclusion
  • Extra fields support
  • Error ID generation and uniqueness
  • Error tracking with context
  • Log directory creation
  • Handler configuration
  • Discord error message formatting
  • Privacy protection (no internal details exposed)
  • Combined logging and formatting

All tests pass: 156 passed (141 existing + 15 new)

4. Updated: tests/test_bot.py

Fixed test_handles_claude_failure to expect new error message format with error ID instead of internal error details.

5. New File: docs/LOGGING.md

Comprehensive documentation including:

  • Architecture overview
  • Component descriptions
  • Usage examples
  • Log analysis with jq
  • Environment variables
  • Privacy and security guidelines
  • Monitoring recommendations
  • Troubleshooting guide

6. New File: test_logging_manual.py

Manual test script for interactive validation of:

  • Basic logging setup
  • JSON format validation
  • Error tracking
  • Discord message formatting
  • Log rotation

Log File Configuration

Locations

  1. Production: /var/log/claude-coordinator/bot.log (if writable)
  2. Fallback: ~/.claude-coordinator/logs/bot.log

Rotation Settings

  • Max file size: 10MB
  • Backup count: 5 rotated files
  • Total retention: ~50MB (5 × 10MB)
  • Encoding: UTF-8

Log Format

File (JSON):

{
    "timestamp": "2026-02-13T19:06:35.545176+00:00",
    "level": "INFO",
    "module": "bot",
    "function": "_handle_claude_request",
    "line": 273,
    "message": "Processing message in channel 12345 for project test-project",
    "channel_id": "12345",
    "project": "test-project",
    "message_length": 150
}

Console (Plain Text):

2026-02-13 19:06:35,545 - claude_coordinator.bot - INFO - Processing message in channel 12345 for project test-project

Discord Error Messages

Before (exposed internal details):

❌ Error running Claude:

Command failed: invalid syntax

After (user-friendly with error ID):

❌ **Claude Error**
Something went wrong processing your request.
Please try again or rephrase your message.

_Error ID: `abc12345`_
_Please reference this ID if requesting support._

Testing Results

$ pytest tests/test_logging.py -v
============================= test session starts ==============================
collected 15 items

tests/test_logging.py::TestJSONFormatter::test_json_formatter_produces_valid_json PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_required_fields PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_exception_info PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_extra_fields PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_returns_string PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_is_unique PASSED
tests/test_logging.py::TestErrorTracker::test_log_error_with_id_includes_error_id PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_creates_log_directory PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_configures_handlers PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_includes_error_id PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_known_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_unknown_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_does_not_expose_internal_details PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_returns_tuple PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_logs_with_context PASSED

============================== 15 passed in 0.12s
$ pytest -v -m 'not integration'
============================== 156 passed, 1 deselected, 2 warnings in 48.49s ==============================

Environment Variables

  • LOG_LEVEL: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
    • Default: INFO
    • Development: DEBUG
    • Production: INFO

Usage Examples

Basic Logging

logger.info(
    "Processing message",
    extra={
        'channel_id': '12345',
        'session_id': 'abc-def',
        'cost_usd': 0.05
    }
)

Error Handling

try:
    result = await operation()
except Exception as e:
    error_id, discord_msg = log_and_format_error(
        logger,
        error_type="claude_error",
        message="Operation failed",
        error=e,
        channel_id=channel_id
    )
    await channel.send(discord_msg)

Performance Tracking

start_time = time.time()
response = await claude_runner.run(...)
duration_ms = int((time.time() - start_time) * 1000)

logger.info(
    "Claude CLI completed",
    extra={
        'duration_ms': duration_ms,
        'cost_usd': response.cost,
        'session_id': response.session_id
    }
)

Log Analysis

# View real-time logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .

# Find all errors
cat bot.log | jq 'select(.level == "ERROR")'

# Find specific error by ID
cat bot.log* | jq 'select(.error_id == "abc12345")'

# Calculate average Claude CLI duration
cat bot.log | jq 'select(.claude_duration_ms) | .claude_duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'

# Track costs
cat bot.log | jq 'select(.cost_usd) | .cost_usd' | awk '{sum+=$1} END {print "Total: $" sum}'

Privacy Protection

Logged (Safe)

Channel IDs
Session IDs (UUIDs)
Timestamps, durations, costs
Error messages and stack traces
Message lengths (not content)

Never Logged

User message content
Discord tokens
Authentication credentials
Personal identifiable information

Exposed in Discord

Error type and description
Error ID for support
Actionable suggestions

Never Exposed in Discord

Stack traces
File paths
Internal error details
Session IDs
Database information

Monitoring Recommendations

Real-time Monitoring:

# Follow all logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .

# Follow errors only
tail -f ~/.claude-coordinator/logs/bot.log | jq 'select(.level == "ERROR")'

Alerting Criteria:

  • High error rate (>5 errors/minute)
  • Repeated timeouts (same channel)
  • Session save failures
  • Database errors
  • Cost spikes (>$1/hour)

Deployment Notes

  1. Log directory created automatically on first run
  2. No manual rotation needed - handled by RotatingFileHandler
  3. Environment variable LOG_LEVEL controls verbosity
  4. Backwards compatible - no breaking changes to existing functionality

Validation

All 156 tests pass
JSON logs parse correctly
Error IDs are unique
Discord messages are user-friendly
Internal details not exposed
Log rotation works correctly
Performance metrics captured
Privacy protection validated

Benefits

  1. Debugging: Structured logs with full context make troubleshooting easier
  2. Monitoring: JSON format enables automated log analysis and alerting
  3. Support: Error IDs allow users to reference specific issues
  4. Privacy: Internal details kept out of Discord messages
  5. Performance: Duration and cost tracking for optimization
  6. Scalability: Log rotation prevents disk space issues
  7. Observability: Comprehensive visibility into bot operations

Next Steps

MED-002 is complete and ready for production deployment.

Recommended follow-ups:

  • Monitor logs in production for 1-2 weeks
  • Set up automated alerting for critical errors
  • Create log analysis dashboard
  • Implement external log aggregation (Loki, CloudWatch)
  • Add cost tracking alerts

Files Changed:

  • claude_coordinator/logging_config.py (new, 371 lines)
  • claude_coordinator/bot.py (enhanced logging)
  • tests/test_logging.py (new, 316 lines, 15 tests)
  • tests/test_bot.py (1 test updated)
  • docs/LOGGING.md (new, comprehensive docs)
  • test_logging_manual.py (new, manual validation)

Test Results: 156/156 passed (15 new tests added)
Status: Ready for deployment