MED-001: Enhanced typing indicator - Persistent typing loop (_maintain_typing method) - Loops every 8s to maintain indicator for long operations (30s-5min) - 8 comprehensive tests covering all lifecycle scenarios - 27/27 bot tests passing MED-002: Structured logging and error reporting - logging_config.py (371 lines) - JSONFormatter, ErrorTracker, format_error_for_discord - RotatingFileHandler (10MB max, 5 backups) - Unique 8-char error IDs for support tracking - Privacy-safe Discord error messages (7 error types) - Enhanced bot.py with structured logging throughout - 15/15 logging tests passing MED-005: Comprehensive test suite - Total: 156/157 tests passing (99.4%) - test_session_manager.py: 27 tests - test_claude_runner.py: 11 tests - test_config.py: 25 tests - test_response_formatter.py: 26 tests - test_bot.py: 27 tests - test_commands.py: 18 tests - test_concurrency.py: 7 tests - test_logging.py: 15 tests Total: 13/18 tasks complete (72.2%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 KiB
MED-002: Logging and Error Reporting Implementation
Status: ✅ COMPLETE
Date: 2026-02-13
Priority: Medium
Effort: Medium (MED-002)
Summary
Implemented comprehensive structured logging with JSON formatting, log rotation, error tracking with unique IDs, and user-friendly Discord error messages. All 156 tests pass (15 new tests added).
Changes Made
1. New File: claude_coordinator/logging_config.py (371 lines)
Features:
JSONFormatter: Structured JSON log output with timestamp, level, module, function, line, message, exception, and custom fieldsErrorTracker: Unique error ID generation and trackingsetup_logging(): Application-wide logging configuration with rotationget_log_directory(): Smart log directory detection (production vs fallback)format_error_for_discord(): User-friendly error messages without internal detailslog_and_format_error(): Combined logging and message formatting
Error Types Supported:
timeout: Claude request timeoutclaude_error: Claude CLI failureparse_error: Malformed JSON responseconfig_error: Configuration issuespermission_error: Permission deniedsession_error: Session management failurelock_error: Lock acquisition failure
2. Updated: claude_coordinator/bot.py
Enhancements:
- Imported and integrated logging_config module
- Enhanced
main()to usesetup_logging()with rotation - Added structured logging throughout message handling
- Integrated error tracking with unique error IDs
- Enhanced error messages with
log_and_format_error() - Added performance metrics (request duration, Claude duration, session operations)
- Added context to all log calls (channel_id, session_id, cost, duration, etc.)
Key Events Logged:
- Bot lifecycle (startup, ready, shutdown)
- Message processing (received, matched, session state)
- Claude operations (command, subprocess, response)
- Error handling (with error IDs and context)
- Performance metrics (durations, costs, chunk counts)
3. New File: tests/test_logging.py (316 lines)
Test Coverage (15 tests):
- JSON formatter produces valid JSON
- Required fields in all log entries
- Exception info inclusion
- Extra fields support
- Error ID generation and uniqueness
- Error tracking with context
- Log directory creation
- Handler configuration
- Discord error message formatting
- Privacy protection (no internal details exposed)
- Combined logging and formatting
All tests pass: ✅ 156 passed (141 existing + 15 new)
4. Updated: tests/test_bot.py
Fixed test_handles_claude_failure to expect new error message format with error ID instead of internal error details.
5. New File: docs/LOGGING.md
Comprehensive documentation including:
- Architecture overview
- Component descriptions
- Usage examples
- Log analysis with jq
- Environment variables
- Privacy and security guidelines
- Monitoring recommendations
- Troubleshooting guide
6. New File: test_logging_manual.py
Manual test script for interactive validation of:
- Basic logging setup
- JSON format validation
- Error tracking
- Discord message formatting
- Log rotation
Log File Configuration
Locations
- Production:
/var/log/claude-coordinator/bot.log(if writable) - Fallback:
~/.claude-coordinator/logs/bot.log
Rotation Settings
- Max file size: 10MB
- Backup count: 5 rotated files
- Total retention: ~50MB (5 × 10MB)
- Encoding: UTF-8
Log Format
File (JSON):
{
"timestamp": "2026-02-13T19:06:35.545176+00:00",
"level": "INFO",
"module": "bot",
"function": "_handle_claude_request",
"line": 273,
"message": "Processing message in channel 12345 for project test-project",
"channel_id": "12345",
"project": "test-project",
"message_length": 150
}
Console (Plain Text):
2026-02-13 19:06:35,545 - claude_coordinator.bot - INFO - Processing message in channel 12345 for project test-project
Discord Error Messages
Before (exposed internal details):
❌ Error running Claude:
Command failed: invalid syntax
After (user-friendly with error ID):
❌ **Claude Error**
Something went wrong processing your request.
Please try again or rephrase your message.
_Error ID: `abc12345`_
_Please reference this ID if requesting support._
Testing Results
$ pytest tests/test_logging.py -v
============================= test session starts ==============================
collected 15 items
tests/test_logging.py::TestJSONFormatter::test_json_formatter_produces_valid_json PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_required_fields PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_exception_info PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_extra_fields PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_returns_string PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_is_unique PASSED
tests/test_logging.py::TestErrorTracker::test_log_error_with_id_includes_error_id PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_creates_log_directory PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_configures_handlers PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_includes_error_id PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_known_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_unknown_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_does_not_expose_internal_details PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_returns_tuple PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_logs_with_context PASSED
============================== 15 passed in 0.12s
$ pytest -v -m 'not integration'
============================== 156 passed, 1 deselected, 2 warnings in 48.49s ==============================
Environment Variables
LOG_LEVEL: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)- Default: INFO
- Development: DEBUG
- Production: INFO
Usage Examples
Basic Logging
logger.info(
"Processing message",
extra={
'channel_id': '12345',
'session_id': 'abc-def',
'cost_usd': 0.05
}
)
Error Handling
try:
result = await operation()
except Exception as e:
error_id, discord_msg = log_and_format_error(
logger,
error_type="claude_error",
message="Operation failed",
error=e,
channel_id=channel_id
)
await channel.send(discord_msg)
Performance Tracking
start_time = time.time()
response = await claude_runner.run(...)
duration_ms = int((time.time() - start_time) * 1000)
logger.info(
"Claude CLI completed",
extra={
'duration_ms': duration_ms,
'cost_usd': response.cost,
'session_id': response.session_id
}
)
Log Analysis
# View real-time logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .
# Find all errors
cat bot.log | jq 'select(.level == "ERROR")'
# Find specific error by ID
cat bot.log* | jq 'select(.error_id == "abc12345")'
# Calculate average Claude CLI duration
cat bot.log | jq 'select(.claude_duration_ms) | .claude_duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'
# Track costs
cat bot.log | jq 'select(.cost_usd) | .cost_usd' | awk '{sum+=$1} END {print "Total: $" sum}'
Privacy Protection
Logged (Safe)
✅ Channel IDs
✅ Session IDs (UUIDs)
✅ Timestamps, durations, costs
✅ Error messages and stack traces
✅ Message lengths (not content)
Never Logged
❌ User message content
❌ Discord tokens
❌ Authentication credentials
❌ Personal identifiable information
Exposed in Discord
✅ Error type and description
✅ Error ID for support
✅ Actionable suggestions
Never Exposed in Discord
❌ Stack traces
❌ File paths
❌ Internal error details
❌ Session IDs
❌ Database information
Monitoring Recommendations
Real-time Monitoring:
# Follow all logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .
# Follow errors only
tail -f ~/.claude-coordinator/logs/bot.log | jq 'select(.level == "ERROR")'
Alerting Criteria:
- High error rate (>5 errors/minute)
- Repeated timeouts (same channel)
- Session save failures
- Database errors
- Cost spikes (>$1/hour)
Deployment Notes
- Log directory created automatically on first run
- No manual rotation needed - handled by RotatingFileHandler
- Environment variable
LOG_LEVELcontrols verbosity - Backwards compatible - no breaking changes to existing functionality
Validation
✅ All 156 tests pass
✅ JSON logs parse correctly
✅ Error IDs are unique
✅ Discord messages are user-friendly
✅ Internal details not exposed
✅ Log rotation works correctly
✅ Performance metrics captured
✅ Privacy protection validated
Benefits
- Debugging: Structured logs with full context make troubleshooting easier
- Monitoring: JSON format enables automated log analysis and alerting
- Support: Error IDs allow users to reference specific issues
- Privacy: Internal details kept out of Discord messages
- Performance: Duration and cost tracking for optimization
- Scalability: Log rotation prevents disk space issues
- Observability: Comprehensive visibility into bot operations
Next Steps
MED-002 is complete and ready for production deployment.
Recommended follow-ups:
- Monitor logs in production for 1-2 weeks
- Set up automated alerting for critical errors
- Create log analysis dashboard
- Implement external log aggregation (Loki, CloudWatch)
- Add cost tracking alerts
Files Changed:
- ✅
claude_coordinator/logging_config.py(new, 371 lines) - ✅
claude_coordinator/bot.py(enhanced logging) - ✅
tests/test_logging.py(new, 316 lines, 15 tests) - ✅
tests/test_bot.py(1 test updated) - ✅
docs/LOGGING.md(new, comprehensive docs) - ✅
test_logging_manual.py(new, manual validation)
Test Results: 156/156 passed (15 new tests added)
Status: ✅ Ready for deployment