# MED-002: Logging and Error Reporting Implementation **Status**: ✅ COMPLETE **Date**: 2026-02-13 **Priority**: Medium **Effort**: Medium (MED-002) ## Summary Implemented comprehensive structured logging with JSON formatting, log rotation, error tracking with unique IDs, and user-friendly Discord error messages. All 156 tests pass (15 new tests added). ## Changes Made ### 1. New File: `claude_coordinator/logging_config.py` (371 lines) **Features:** - `JSONFormatter`: Structured JSON log output with timestamp, level, module, function, line, message, exception, and custom fields - `ErrorTracker`: Unique error ID generation and tracking - `setup_logging()`: Application-wide logging configuration with rotation - `get_log_directory()`: Smart log directory detection (production vs fallback) - `format_error_for_discord()`: User-friendly error messages without internal details - `log_and_format_error()`: Combined logging and message formatting **Error Types Supported:** - `timeout`: Claude request timeout - `claude_error`: Claude CLI failure - `parse_error`: Malformed JSON response - `config_error`: Configuration issues - `permission_error`: Permission denied - `session_error`: Session management failure - `lock_error`: Lock acquisition failure ### 2. Updated: `claude_coordinator/bot.py` **Enhancements:** - Imported and integrated logging_config module - Enhanced `main()` to use `setup_logging()` with rotation - Added structured logging throughout message handling - Integrated error tracking with unique error IDs - Enhanced error messages with `log_and_format_error()` - Added performance metrics (request duration, Claude duration, session operations) - Added context to all log calls (channel_id, session_id, cost, duration, etc.) **Key Events Logged:** - Bot lifecycle (startup, ready, shutdown) - Message processing (received, matched, session state) - Claude operations (command, subprocess, response) - Error handling (with error IDs and context) - Performance metrics (durations, costs, chunk counts) ### 3. New File: `tests/test_logging.py` (316 lines) **Test Coverage (15 tests):** - JSON formatter produces valid JSON - Required fields in all log entries - Exception info inclusion - Extra fields support - Error ID generation and uniqueness - Error tracking with context - Log directory creation - Handler configuration - Discord error message formatting - Privacy protection (no internal details exposed) - Combined logging and formatting **All tests pass**: ✅ 156 passed (141 existing + 15 new) ### 4. Updated: `tests/test_bot.py` Fixed `test_handles_claude_failure` to expect new error message format with error ID instead of internal error details. ### 5. New File: `docs/LOGGING.md` Comprehensive documentation including: - Architecture overview - Component descriptions - Usage examples - Log analysis with jq - Environment variables - Privacy and security guidelines - Monitoring recommendations - Troubleshooting guide ### 6. New File: `test_logging_manual.py` Manual test script for interactive validation of: - Basic logging setup - JSON format validation - Error tracking - Discord message formatting - Log rotation ## Log File Configuration ### Locations 1. **Production**: `/var/log/claude-coordinator/bot.log` (if writable) 2. **Fallback**: `~/.claude-coordinator/logs/bot.log` ### Rotation Settings - **Max file size**: 10MB - **Backup count**: 5 rotated files - **Total retention**: ~50MB (5 × 10MB) - **Encoding**: UTF-8 ### Log Format **File (JSON)**: ```json { "timestamp": "2026-02-13T19:06:35.545176+00:00", "level": "INFO", "module": "bot", "function": "_handle_claude_request", "line": 273, "message": "Processing message in channel 12345 for project test-project", "channel_id": "12345", "project": "test-project", "message_length": 150 } ``` **Console (Plain Text)**: ``` 2026-02-13 19:06:35,545 - claude_coordinator.bot - INFO - Processing message in channel 12345 for project test-project ``` ## Discord Error Messages Before (exposed internal details): ``` ❌ Error running Claude: ``` Command failed: invalid syntax ``` ``` After (user-friendly with error ID): ``` ❌ **Claude Error** Something went wrong processing your request. Please try again or rephrase your message. _Error ID: `abc12345`_ _Please reference this ID if requesting support._ ``` ## Testing Results ```bash $ pytest tests/test_logging.py -v ============================= test session starts ============================== collected 15 items tests/test_logging.py::TestJSONFormatter::test_json_formatter_produces_valid_json PASSED tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_required_fields PASSED tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_exception_info PASSED tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_extra_fields PASSED tests/test_logging.py::TestErrorTracker::test_generate_error_id_returns_string PASSED tests/test_logging.py::TestErrorTracker::test_generate_error_id_is_unique PASSED tests/test_logging.py::TestErrorTracker::test_log_error_with_id_includes_error_id PASSED tests/test_logging.py::TestSetupLogging::test_setup_logging_creates_log_directory PASSED tests/test_logging.py::TestSetupLogging::test_setup_logging_configures_handlers PASSED tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_includes_error_id PASSED tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_known_types PASSED tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_unknown_types PASSED tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_does_not_expose_internal_details PASSED tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_returns_tuple PASSED tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_logs_with_context PASSED ============================== 15 passed in 0.12s ``` ```bash $ pytest -v -m 'not integration' ============================== 156 passed, 1 deselected, 2 warnings in 48.49s ============================== ``` ## Environment Variables - `LOG_LEVEL`: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) - Default: INFO - Development: DEBUG - Production: INFO ## Usage Examples ### Basic Logging ```python logger.info( "Processing message", extra={ 'channel_id': '12345', 'session_id': 'abc-def', 'cost_usd': 0.05 } ) ``` ### Error Handling ```python try: result = await operation() except Exception as e: error_id, discord_msg = log_and_format_error( logger, error_type="claude_error", message="Operation failed", error=e, channel_id=channel_id ) await channel.send(discord_msg) ``` ### Performance Tracking ```python start_time = time.time() response = await claude_runner.run(...) duration_ms = int((time.time() - start_time) * 1000) logger.info( "Claude CLI completed", extra={ 'duration_ms': duration_ms, 'cost_usd': response.cost, 'session_id': response.session_id } ) ``` ## Log Analysis ```bash # View real-time logs tail -f ~/.claude-coordinator/logs/bot.log | jq . # Find all errors cat bot.log | jq 'select(.level == "ERROR")' # Find specific error by ID cat bot.log* | jq 'select(.error_id == "abc12345")' # Calculate average Claude CLI duration cat bot.log | jq 'select(.claude_duration_ms) | .claude_duration_ms' | awk '{sum+=$1; count++} END {print sum/count}' # Track costs cat bot.log | jq 'select(.cost_usd) | .cost_usd' | awk '{sum+=$1} END {print "Total: $" sum}' ``` ## Privacy Protection ### Logged (Safe) ✅ Channel IDs ✅ Session IDs (UUIDs) ✅ Timestamps, durations, costs ✅ Error messages and stack traces ✅ Message lengths (not content) ### Never Logged ❌ User message content ❌ Discord tokens ❌ Authentication credentials ❌ Personal identifiable information ### Exposed in Discord ✅ Error type and description ✅ Error ID for support ✅ Actionable suggestions ### Never Exposed in Discord ❌ Stack traces ❌ File paths ❌ Internal error details ❌ Session IDs ❌ Database information ## Monitoring Recommendations **Real-time Monitoring:** ```bash # Follow all logs tail -f ~/.claude-coordinator/logs/bot.log | jq . # Follow errors only tail -f ~/.claude-coordinator/logs/bot.log | jq 'select(.level == "ERROR")' ``` **Alerting Criteria:** - High error rate (>5 errors/minute) - Repeated timeouts (same channel) - Session save failures - Database errors - Cost spikes (>$1/hour) ## Deployment Notes 1. **Log directory created automatically** on first run 2. **No manual rotation needed** - handled by RotatingFileHandler 3. **Environment variable** `LOG_LEVEL` controls verbosity 4. **Backwards compatible** - no breaking changes to existing functionality ## Validation ✅ All 156 tests pass ✅ JSON logs parse correctly ✅ Error IDs are unique ✅ Discord messages are user-friendly ✅ Internal details not exposed ✅ Log rotation works correctly ✅ Performance metrics captured ✅ Privacy protection validated ## Benefits 1. **Debugging**: Structured logs with full context make troubleshooting easier 2. **Monitoring**: JSON format enables automated log analysis and alerting 3. **Support**: Error IDs allow users to reference specific issues 4. **Privacy**: Internal details kept out of Discord messages 5. **Performance**: Duration and cost tracking for optimization 6. **Scalability**: Log rotation prevents disk space issues 7. **Observability**: Comprehensive visibility into bot operations ## Next Steps MED-002 is complete and ready for production deployment. **Recommended follow-ups:** - [ ] Monitor logs in production for 1-2 weeks - [ ] Set up automated alerting for critical errors - [ ] Create log analysis dashboard - [ ] Implement external log aggregation (Loki, CloudWatch) - [ ] Add cost tracking alerts --- **Files Changed:** - ✅ `claude_coordinator/logging_config.py` (new, 371 lines) - ✅ `claude_coordinator/bot.py` (enhanced logging) - ✅ `tests/test_logging.py` (new, 316 lines, 15 tests) - ✅ `tests/test_bot.py` (1 test updated) - ✅ `docs/LOGGING.md` (new, comprehensive docs) - ✅ `test_logging_manual.py` (new, manual validation) **Test Results**: 156/156 passed (15 new tests added) **Status**: ✅ Ready for deployment