Claude Discord Bot e6983b56b9 Week 3 progress: Logging and testing complete (MED-001, MED-002, MED-005)

MED-001: Enhanced typing indicator
- Persistent typing loop (_maintain_typing method)
- Loops every 8s to maintain indicator for long operations (30s-5min)
- 8 comprehensive tests covering all lifecycle scenarios
- 27/27 bot tests passing

MED-002: Structured logging and error reporting
- logging_config.py (371 lines) - JSONFormatter, ErrorTracker, format_error_for_discord
- RotatingFileHandler (10MB max, 5 backups)
- Unique 8-char error IDs for support tracking
- Privacy-safe Discord error messages (7 error types)
- Enhanced bot.py with structured logging throughout
- 15/15 logging tests passing

MED-005: Comprehensive test suite
- Total: 156/157 tests passing (99.4%)
- test_session_manager.py: 27 tests
- test_claude_runner.py: 11 tests
- test_config.py: 25 tests
- test_response_formatter.py: 26 tests
- test_bot.py: 27 tests
- test_commands.py: 18 tests
- test_concurrency.py: 7 tests
- test_logging.py: 15 tests

Total: 13/18 tasks complete (72.2%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-13 19:12:04 +00:00

10 KiB

Raw Blame History

MED-002: Logging and Error Reporting Implementation

Status: ✅ COMPLETE
Date: 2026-02-13
Priority: Medium
Effort: Medium (MED-002)

Summary

Implemented comprehensive structured logging with JSON formatting, log rotation, error tracking with unique IDs, and user-friendly Discord error messages. All 156 tests pass (15 new tests added).

Changes Made

1. New File: `claude_coordinator/logging_config.py` (371 lines)

Features:

JSONFormatter: Structured JSON log output with timestamp, level, module, function, line, message, exception, and custom fields
ErrorTracker: Unique error ID generation and tracking
setup_logging(): Application-wide logging configuration with rotation
get_log_directory(): Smart log directory detection (production vs fallback)
format_error_for_discord(): User-friendly error messages without internal details
log_and_format_error(): Combined logging and message formatting

Error Types Supported:

timeout: Claude request timeout
claude_error: Claude CLI failure
parse_error: Malformed JSON response
config_error: Configuration issues
permission_error: Permission denied
session_error: Session management failure
lock_error: Lock acquisition failure

2. Updated: `claude_coordinator/bot.py`

Enhancements:

Imported and integrated logging_config module
Enhanced main() to use setup_logging() with rotation
Added structured logging throughout message handling
Integrated error tracking with unique error IDs
Enhanced error messages with log_and_format_error()
Added performance metrics (request duration, Claude duration, session operations)
Added context to all log calls (channel_id, session_id, cost, duration, etc.)

Key Events Logged:

Bot lifecycle (startup, ready, shutdown)
Message processing (received, matched, session state)
Claude operations (command, subprocess, response)
Error handling (with error IDs and context)
Performance metrics (durations, costs, chunk counts)

3. New File: `tests/test_logging.py` (316 lines)

Test Coverage (15 tests):

JSON formatter produces valid JSON
Required fields in all log entries
Exception info inclusion
Extra fields support
Error ID generation and uniqueness
Error tracking with context
Log directory creation
Handler configuration
Discord error message formatting
Privacy protection (no internal details exposed)
Combined logging and formatting

All tests pass: ✅ 156 passed (141 existing + 15 new)

4. Updated: `tests/test_bot.py`

Fixed test_handles_claude_failure to expect new error message format with error ID instead of internal error details.

5. New File: `docs/LOGGING.md`

Comprehensive documentation including:

Architecture overview
Component descriptions
Usage examples
Log analysis with jq
Environment variables
Privacy and security guidelines
Monitoring recommendations
Troubleshooting guide

6. New File: `test_logging_manual.py`

Manual test script for interactive validation of:

Basic logging setup
JSON format validation
Error tracking
Discord message formatting
Log rotation

Log File Configuration

Locations

Production: /var/log/claude-coordinator/bot.log (if writable)
Fallback: ~/.claude-coordinator/logs/bot.log

Rotation Settings

Max file size: 10MB
Backup count: 5 rotated files
Total retention: ~50MB (5 × 10MB)
Encoding: UTF-8

Log Format

File (JSON):

{
    "timestamp": "2026-02-13T19:06:35.545176+00:00",
    "level": "INFO",
    "module": "bot",
    "function": "_handle_claude_request",
    "line": 273,
    "message": "Processing message in channel 12345 for project test-project",
    "channel_id": "12345",
    "project": "test-project",
    "message_length": 150
}

Console (Plain Text):

2026-02-13 19:06:35,545 - claude_coordinator.bot - INFO - Processing message in channel 12345 for project test-project

Discord Error Messages

Before (exposed internal details):

❌ Error running Claude:

Command failed: invalid syntax

After (user-friendly with error ID):

❌ **Claude Error**
Something went wrong processing your request.
Please try again or rephrase your message.

_Error ID: `abc12345`_
_Please reference this ID if requesting support._

Testing Results

$ pytest tests/test_logging.py -v
============================= test session starts ==============================
collected 15 items

tests/test_logging.py::TestJSONFormatter::test_json_formatter_produces_valid_json PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_required_fields PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_exception_info PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_extra_fields PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_returns_string PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_is_unique PASSED
tests/test_logging.py::TestErrorTracker::test_log_error_with_id_includes_error_id PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_creates_log_directory PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_configures_handlers PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_includes_error_id PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_known_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_unknown_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_does_not_expose_internal_details PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_returns_tuple PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_logs_with_context PASSED

============================== 15 passed in 0.12s

$ pytest -v -m 'not integration'
============================== 156 passed, 1 deselected, 2 warnings in 48.49s ==============================

Environment Variables

LOG_LEVEL: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Default: INFO
- Development: DEBUG
- Production: INFO

Usage Examples

Basic Logging

logger.info(
    "Processing message",
    extra={
        'channel_id': '12345',
        'session_id': 'abc-def',
        'cost_usd': 0.05
    }
)

Error Handling

try:
    result = await operation()
except Exception as e:
    error_id, discord_msg = log_and_format_error(
        logger,
        error_type="claude_error",
        message="Operation failed",
        error=e,
        channel_id=channel_id
    )
    await channel.send(discord_msg)

Performance Tracking

start_time = time.time()
response = await claude_runner.run(...)
duration_ms = int((time.time() - start_time) * 1000)

logger.info(
    "Claude CLI completed",
    extra={
        'duration_ms': duration_ms,
        'cost_usd': response.cost,
        'session_id': response.session_id
    }
)

Log Analysis

# View real-time logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .

# Find all errors
cat bot.log | jq 'select(.level == "ERROR")'

# Find specific error by ID
cat bot.log* | jq 'select(.error_id == "abc12345")'

# Calculate average Claude CLI duration
cat bot.log | jq 'select(.claude_duration_ms) | .claude_duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'

# Track costs
cat bot.log | jq 'select(.cost_usd) | .cost_usd' | awk '{sum+=$1} END {print "Total: $" sum}'

Privacy Protection

Logged (Safe)

✅ Channel IDs
✅ Session IDs (UUIDs)
✅ Timestamps, durations, costs
✅ Error messages and stack traces
✅ Message lengths (not content)

Never Logged

❌ User message content
❌ Discord tokens
❌ Authentication credentials
❌ Personal identifiable information

Exposed in Discord

✅ Error type and description
✅ Error ID for support
✅ Actionable suggestions

Never Exposed in Discord

❌ Stack traces
❌ File paths
❌ Internal error details
❌ Session IDs
❌ Database information

Monitoring Recommendations

Real-time Monitoring:

# Follow all logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .

# Follow errors only
tail -f ~/.claude-coordinator/logs/bot.log | jq 'select(.level == "ERROR")'

Alerting Criteria:

High error rate (>5 errors/minute)
Repeated timeouts (same channel)
Session save failures
Database errors
Cost spikes (>$1/hour)

Deployment Notes

Log directory created automatically on first run
No manual rotation needed - handled by RotatingFileHandler
Environment variable LOG_LEVEL controls verbosity
Backwards compatible - no breaking changes to existing functionality

Validation

✅ All 156 tests pass
✅ JSON logs parse correctly
✅ Error IDs are unique
✅ Discord messages are user-friendly
✅ Internal details not exposed
✅ Log rotation works correctly
✅ Performance metrics captured
✅ Privacy protection validated

Benefits

Debugging: Structured logs with full context make troubleshooting easier
Monitoring: JSON format enables automated log analysis and alerting
Support: Error IDs allow users to reference specific issues
Privacy: Internal details kept out of Discord messages
Performance: Duration and cost tracking for optimization
Scalability: Log rotation prevents disk space issues
Observability: Comprehensive visibility into bot operations

Next Steps

MED-002 is complete and ready for production deployment.

Recommended follow-ups:

Monitor logs in production for 1-2 weeks
Set up automated alerting for critical errors
Create log analysis dashboard
Implement external log aggregation (Loki, CloudWatch)
Add cost tracking alerts

Files Changed:

✅ claude_coordinator/logging_config.py (new, 371 lines)
✅ claude_coordinator/bot.py (enhanced logging)
✅ tests/test_logging.py (new, 316 lines, 15 tests)
✅ tests/test_bot.py (1 test updated)
✅ docs/LOGGING.md (new, comprehensive docs)
✅ test_logging_manual.py (new, manual validation)

Test Results: 156/156 passed (15 new tests added)
Status: ✅ Ready for deployment

10 KiB Raw Blame History Unescape Escape