ai-assistant-discord-bot/MED-002_IMPLEMENTATION.md
Claude Discord Bot e6983b56b9 Week 3 progress: Logging and testing complete (MED-001, MED-002, MED-005)
MED-001: Enhanced typing indicator
- Persistent typing loop (_maintain_typing method)
- Loops every 8s to maintain indicator for long operations (30s-5min)
- 8 comprehensive tests covering all lifecycle scenarios
- 27/27 bot tests passing

MED-002: Structured logging and error reporting
- logging_config.py (371 lines) - JSONFormatter, ErrorTracker, format_error_for_discord
- RotatingFileHandler (10MB max, 5 backups)
- Unique 8-char error IDs for support tracking
- Privacy-safe Discord error messages (7 error types)
- Enhanced bot.py with structured logging throughout
- 15/15 logging tests passing

MED-005: Comprehensive test suite
- Total: 156/157 tests passing (99.4%)
- test_session_manager.py: 27 tests
- test_claude_runner.py: 11 tests
- test_config.py: 25 tests
- test_response_formatter.py: 26 tests
- test_bot.py: 27 tests
- test_commands.py: 18 tests
- test_concurrency.py: 7 tests
- test_logging.py: 15 tests

Total: 13/18 tasks complete (72.2%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 19:12:04 +00:00

345 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# MED-002: Logging and Error Reporting Implementation
**Status**: ✅ COMPLETE
**Date**: 2026-02-13
**Priority**: Medium
**Effort**: Medium (MED-002)
## Summary
Implemented comprehensive structured logging with JSON formatting, log rotation, error tracking with unique IDs, and user-friendly Discord error messages. All 156 tests pass (15 new tests added).
## Changes Made
### 1. New File: `claude_coordinator/logging_config.py` (371 lines)
**Features:**
- `JSONFormatter`: Structured JSON log output with timestamp, level, module, function, line, message, exception, and custom fields
- `ErrorTracker`: Unique error ID generation and tracking
- `setup_logging()`: Application-wide logging configuration with rotation
- `get_log_directory()`: Smart log directory detection (production vs fallback)
- `format_error_for_discord()`: User-friendly error messages without internal details
- `log_and_format_error()`: Combined logging and message formatting
**Error Types Supported:**
- `timeout`: Claude request timeout
- `claude_error`: Claude CLI failure
- `parse_error`: Malformed JSON response
- `config_error`: Configuration issues
- `permission_error`: Permission denied
- `session_error`: Session management failure
- `lock_error`: Lock acquisition failure
### 2. Updated: `claude_coordinator/bot.py`
**Enhancements:**
- Imported and integrated logging_config module
- Enhanced `main()` to use `setup_logging()` with rotation
- Added structured logging throughout message handling
- Integrated error tracking with unique error IDs
- Enhanced error messages with `log_and_format_error()`
- Added performance metrics (request duration, Claude duration, session operations)
- Added context to all log calls (channel_id, session_id, cost, duration, etc.)
**Key Events Logged:**
- Bot lifecycle (startup, ready, shutdown)
- Message processing (received, matched, session state)
- Claude operations (command, subprocess, response)
- Error handling (with error IDs and context)
- Performance metrics (durations, costs, chunk counts)
### 3. New File: `tests/test_logging.py` (316 lines)
**Test Coverage (15 tests):**
- JSON formatter produces valid JSON
- Required fields in all log entries
- Exception info inclusion
- Extra fields support
- Error ID generation and uniqueness
- Error tracking with context
- Log directory creation
- Handler configuration
- Discord error message formatting
- Privacy protection (no internal details exposed)
- Combined logging and formatting
**All tests pass**: ✅ 156 passed (141 existing + 15 new)
### 4. Updated: `tests/test_bot.py`
Fixed `test_handles_claude_failure` to expect new error message format with error ID instead of internal error details.
### 5. New File: `docs/LOGGING.md`
Comprehensive documentation including:
- Architecture overview
- Component descriptions
- Usage examples
- Log analysis with jq
- Environment variables
- Privacy and security guidelines
- Monitoring recommendations
- Troubleshooting guide
### 6. New File: `test_logging_manual.py`
Manual test script for interactive validation of:
- Basic logging setup
- JSON format validation
- Error tracking
- Discord message formatting
- Log rotation
## Log File Configuration
### Locations
1. **Production**: `/var/log/claude-coordinator/bot.log` (if writable)
2. **Fallback**: `~/.claude-coordinator/logs/bot.log`
### Rotation Settings
- **Max file size**: 10MB
- **Backup count**: 5 rotated files
- **Total retention**: ~50MB (5 × 10MB)
- **Encoding**: UTF-8
### Log Format
**File (JSON)**:
```json
{
"timestamp": "2026-02-13T19:06:35.545176+00:00",
"level": "INFO",
"module": "bot",
"function": "_handle_claude_request",
"line": 273,
"message": "Processing message in channel 12345 for project test-project",
"channel_id": "12345",
"project": "test-project",
"message_length": 150
}
```
**Console (Plain Text)**:
```
2026-02-13 19:06:35,545 - claude_coordinator.bot - INFO - Processing message in channel 12345 for project test-project
```
## Discord Error Messages
Before (exposed internal details):
```
❌ Error running Claude:
```
Command failed: invalid syntax
```
```
After (user-friendly with error ID):
```
❌ **Claude Error**
Something went wrong processing your request.
Please try again or rephrase your message.
_Error ID: `abc12345`_
_Please reference this ID if requesting support._
```
## Testing Results
```bash
$ pytest tests/test_logging.py -v
============================= test session starts ==============================
collected 15 items
tests/test_logging.py::TestJSONFormatter::test_json_formatter_produces_valid_json PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_required_fields PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_exception_info PASSED
tests/test_logging.py::TestJSONFormatter::test_json_formatter_includes_extra_fields PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_returns_string PASSED
tests/test_logging.py::TestErrorTracker::test_generate_error_id_is_unique PASSED
tests/test_logging.py::TestErrorTracker::test_log_error_with_id_includes_error_id PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_creates_log_directory PASSED
tests/test_logging.py::TestSetupLogging::test_setup_logging_configures_handlers PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_includes_error_id PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_known_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_handles_unknown_types PASSED
tests/test_logging.py::TestFormatErrorForDiscord::test_format_error_does_not_expose_internal_details PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_returns_tuple PASSED
tests/test_logging.py::TestLogAndFormatError::test_log_and_format_error_logs_with_context PASSED
============================== 15 passed in 0.12s
```
```bash
$ pytest -v -m 'not integration'
============================== 156 passed, 1 deselected, 2 warnings in 48.49s ==============================
```
## Environment Variables
- `LOG_LEVEL`: Set logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Default: INFO
- Development: DEBUG
- Production: INFO
## Usage Examples
### Basic Logging
```python
logger.info(
"Processing message",
extra={
'channel_id': '12345',
'session_id': 'abc-def',
'cost_usd': 0.05
}
)
```
### Error Handling
```python
try:
result = await operation()
except Exception as e:
error_id, discord_msg = log_and_format_error(
logger,
error_type="claude_error",
message="Operation failed",
error=e,
channel_id=channel_id
)
await channel.send(discord_msg)
```
### Performance Tracking
```python
start_time = time.time()
response = await claude_runner.run(...)
duration_ms = int((time.time() - start_time) * 1000)
logger.info(
"Claude CLI completed",
extra={
'duration_ms': duration_ms,
'cost_usd': response.cost,
'session_id': response.session_id
}
)
```
## Log Analysis
```bash
# View real-time logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .
# Find all errors
cat bot.log | jq 'select(.level == "ERROR")'
# Find specific error by ID
cat bot.log* | jq 'select(.error_id == "abc12345")'
# Calculate average Claude CLI duration
cat bot.log | jq 'select(.claude_duration_ms) | .claude_duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'
# Track costs
cat bot.log | jq 'select(.cost_usd) | .cost_usd' | awk '{sum+=$1} END {print "Total: $" sum}'
```
## Privacy Protection
### Logged (Safe)
✅ Channel IDs
✅ Session IDs (UUIDs)
✅ Timestamps, durations, costs
✅ Error messages and stack traces
✅ Message lengths (not content)
### Never Logged
❌ User message content
❌ Discord tokens
❌ Authentication credentials
❌ Personal identifiable information
### Exposed in Discord
✅ Error type and description
✅ Error ID for support
✅ Actionable suggestions
### Never Exposed in Discord
❌ Stack traces
❌ File paths
❌ Internal error details
❌ Session IDs
❌ Database information
## Monitoring Recommendations
**Real-time Monitoring:**
```bash
# Follow all logs
tail -f ~/.claude-coordinator/logs/bot.log | jq .
# Follow errors only
tail -f ~/.claude-coordinator/logs/bot.log | jq 'select(.level == "ERROR")'
```
**Alerting Criteria:**
- High error rate (>5 errors/minute)
- Repeated timeouts (same channel)
- Session save failures
- Database errors
- Cost spikes (>$1/hour)
## Deployment Notes
1. **Log directory created automatically** on first run
2. **No manual rotation needed** - handled by RotatingFileHandler
3. **Environment variable** `LOG_LEVEL` controls verbosity
4. **Backwards compatible** - no breaking changes to existing functionality
## Validation
✅ All 156 tests pass
✅ JSON logs parse correctly
✅ Error IDs are unique
✅ Discord messages are user-friendly
✅ Internal details not exposed
✅ Log rotation works correctly
✅ Performance metrics captured
✅ Privacy protection validated
## Benefits
1. **Debugging**: Structured logs with full context make troubleshooting easier
2. **Monitoring**: JSON format enables automated log analysis and alerting
3. **Support**: Error IDs allow users to reference specific issues
4. **Privacy**: Internal details kept out of Discord messages
5. **Performance**: Duration and cost tracking for optimization
6. **Scalability**: Log rotation prevents disk space issues
7. **Observability**: Comprehensive visibility into bot operations
## Next Steps
MED-002 is complete and ready for production deployment.
**Recommended follow-ups:**
- [ ] Monitor logs in production for 1-2 weeks
- [ ] Set up automated alerting for critical errors
- [ ] Create log analysis dashboard
- [ ] Implement external log aggregation (Loki, CloudWatch)
- [ ] Add cost tracking alerts
---
**Files Changed:**
-`claude_coordinator/logging_config.py` (new, 371 lines)
-`claude_coordinator/bot.py` (enhanced logging)
-`tests/test_logging.py` (new, 316 lines, 15 tests)
-`tests/test_bot.py` (1 test updated)
-`docs/LOGGING.md` (new, comprehensive docs)
-`test_logging_manual.py` (new, manual validation)
**Test Results**: 156/156 passed (15 new tests added)
**Status**: ✅ Ready for deployment