Strat-O-Matic rules Q&A chatbot with Discord integration

Go to file

Cal Corum 1f1048ee08 refactor: migrate Discord bot to hexagonal adapter, remove old app/ directory Discord bot inbound adapter (adapters/inbound/discord_bot.py): - ChatService injected directly — no HTTP roundtrip to FastAPI API - No module-level singleton: create_bot() factory for construction - Pure functions extracted for testing: build_answer_embed, build_error_embed, parse_conversation_id - Uses message.reference.resolved cache before fetch_message - Error embeds never leak exception details - 19 new tests covering embed building, footer parsing, error safety Removed old app/ directory (9 files): - All functionality preserved in hexagonal domain/, adapters/, config/ - Old test_basic.py removed (superseded by 120 adapter/domain tests) Other changes: - docker-compose: api uses main:app, discord-bot uses run_discord.py with direct ChatService injection (no API dependency) - Removed unused openai dependency from pyproject.toml - Removed app/ from hatch build targets Test suite: 120 passed, 1 skipped Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-03-08 16:07:36 -05:00
adapters	refactor: migrate Discord bot to hexagonal adapter, remove old app/ directory	2026-03-08 16:07:36 -05:00
config	fix: resolve MEDIUM-severity issues from code review	2026-03-08 16:04:25 -05:00
data/rules	feat: initial chatbot implementation with FastAPI, ChromaDB, Discord bot, and Gitea integration	2026-03-08 15:19:26 -05:00
domain	fix: resolve HIGH-severity issues from code review	2026-03-08 16:00:26 -05:00
scripts	feat: initial chatbot implementation with FastAPI, ChromaDB, Discord bot, and Gitea integration	2026-03-08 15:19:26 -05:00
tests	refactor: migrate Discord bot to hexagonal adapter, remove old app/ directory	2026-03-08 16:07:36 -05:00
.env.example	feat: initial chatbot implementation with FastAPI, ChromaDB, Discord bot, and Gitea integration	2026-03-08 15:19:26 -05:00
.gitignore	feat: initial chatbot implementation with FastAPI, ChromaDB, Discord bot, and Gitea integration	2026-03-08 15:19:26 -05:00
docker-compose.yml	refactor: migrate Discord bot to hexagonal adapter, remove old app/ directory	2026-03-08 16:07:36 -05:00
Dockerfile	feat: initial chatbot implementation with FastAPI, ChromaDB, Discord bot, and Gitea integration	2026-03-08 15:19:26 -05:00
main.py	refactor: hexagonal architecture with ports & adapters, DI, and test-first development	2026-03-08 15:51:16 -05:00
pyproject.toml	refactor: migrate Discord bot to hexagonal adapter, remove old app/ directory	2026-03-08 16:07:36 -05:00
pyrightconfig.json	refactor: hexagonal architecture with ports & adapters, DI, and test-first development	2026-03-08 15:51:16 -05:00
README.md	feat: initial chatbot implementation with FastAPI, ChromaDB, Discord bot, and Gitea integration	2026-03-08 15:19:26 -05:00
run_discord.py	refactor: migrate Discord bot to hexagonal adapter, remove old app/ directory	2026-03-08 16:07:36 -05:00
setup.sh	feat: initial chatbot implementation with FastAPI, ChromaDB, Discord bot, and Gitea integration	2026-03-08 15:19:26 -05:00
uv.lock	refactor: migrate Discord bot to hexagonal adapter, remove old app/ directory	2026-03-08 16:07:36 -05:00

README.md

Strat-Chatbot

AI-powered Q&A chatbot for Strat-O-Matic baseball league rules.

Features

Natural language Q&A: Ask questions about league rules in plain English
Semantic search: Uses ChromaDB vector embeddings to find relevant rules
Rule citations: Always cites specific rule IDs (e.g., "Rule 5.2.1(b)")
Conversation threading: Maintains conversation context for follow-up questions
Gitea integration: Automatically creates issues for unanswered questions
Discord integration: Slash command /ask with reply-based follow-ups

Architecture

┌─────────┐    ┌──────────────┐    ┌─────────────┐
│ Discord │────│ FastAPI      │────│ ChromaDB    │
│ Bot     │    │ (port 8000)  │    │ (vectors)   │
└─────────┘    └──────────────┘    └─────────────┘
                                        │
                                  ┌───────▼──────┐
                                  │ Markdown     │
                                  │ Rule Files   │
                                  └──────────────┘
                                        │
                                  ┌───────▼──────┐
                                  │ OpenRouter   │
                                  │ (LLM API)    │
                                  └──────────────┘
                                        │
                                  ┌───────▼──────┐
                                  │ Gitea        │
                                  │ Issues       │
                                  └──────────────┘

Quick Start

Prerequisites

Docker & Docker Compose
OpenRouter API key
Discord bot token
Gitea token (optional, for issue creation)

Setup

Clone and configure

cd strat-chatbot
cp .env.example .env
# Edit .env with your API keys and tokens

Prepare rules

Place your rule documents in data/rules/ as Markdown files with YAML frontmatter:

---
rule_id: "5.2.1(b)"
title: "Stolen Base Attempts"
section: "Baserunning"
parent_rule: "5.2"
page_ref: "32"
---

When a runner attempts to steal...

Ingest rules

# With Docker Compose (recommended)
docker compose up -d
docker compose exec api python scripts/ingest_rules.py

# Or locally
uv sync
uv run scripts/ingest_rules.py

Start services

docker compose up -d

The API will be available at http://localhost:8000

The Discord bot will connect and sync slash commands.

Runtime Configuration

Environment Variable	Required?	Description
`OPENROUTER_API_KEY`	Yes	OpenRouter API key
`OPENROUTER_MODEL`	No	Model ID (default: `stepfun/step-3.5-flash:free`)
`DISCORD_BOT_TOKEN`	No	Discord bot token (omit to run API only)
`DISCORD_GUILD_ID`	No	Guild ID for slash command sync (faster than global)
`GITEA_TOKEN`	No	Gitea API token (for issue creation)
`GITEA_OWNER`	No	Gitea username (default: `cal`)
`GITEA_REPO`	No	Repository name (default: `strat-chatbot`)

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check with stats
`/chat`	POST	Send a question and get a response
`/stats`	GET	Knowledge base and system statistics

Chat Request

{
  "message": "Can a runner steal on a 2-2 count?",
  "user_id": "123456789",
  "channel_id": "987654321",
  "conversation_id": "optional-uuid",
  "parent_message_id": "optional-parent-uuid"
}

Chat Response

{
  "response": "Yes, according to Rule 5.2.1(b)...",
  "conversation_id": "conv-uuid",
  "message_id": "msg-uuid",
  "cited_rules": ["5.2.1(b)", "5.3"],
  "confidence": 0.85,
  "needs_human": false
}

Development

Local Development (without Docker)

# Install dependencies
uv sync

# Ingest rules
uv run scripts/ingest_rules.py

# Run API server
uv run app/main.py

# In another terminal, run Discord bot
uv run app/discord_bot.py

Project Structure

strat-chatbot/
├── app/
│   ├── __init__.py
│   ├── config.py          # Configuration management
│   ├── database.py        # SQLAlchemy conversation state
│   ├── gitea.py           # Gitea API client
│   ├── llm.py             # OpenRouter integration
│   ├── main.py            # FastAPI app
│   ├── models.py          # Pydantic models
│   ├── vector_store.py    # ChromaDB wrapper
│   └── discord_bot.py     # Discord bot
├── data/
│   ├── chroma/            # Vector DB (auto-created)
│   └── rules/             # Your markdown rule files
├── scripts/
│   └── ingest_rules.py    # Ingestion pipeline
├── tests/                 # Test files
├── .env.example
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml

Performance Optimizations

Embedding cache: ChromaDB persists embeddings on disk
Rule chunking: Each rule is a separate document, no context fragmentation
Top-k search: Configurable number of rules to retrieve (default: 10)
Conversation TTL: 30 minutes to limit database size
Async operations: All I/O is non-blocking

Testing the API

curl -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What happens if the pitcher balks?",
    "user_id": "test123",
    "channel_id": "general"
  }'

Gitea Integration

When the bot encounters a question it can't answer confidently (confidence < 0.4), it will automatically:

Log the question to console
Create an issue in your configured Gitea repo
Include: user ID, channel, question, attempted rules, conversation link

Issues are labeled with:

rules-gap - needs a rule addition or clarification
ai-generated - created by AI bot
needs-review - requires human administrator attention

To-Do

Build OpenRouter Docker client with proper torch dependencies
Add PDF ingestion support (convert PDF → Markdown)
Implement rule change detection and incremental updates
Add rate limiting per Discord user
Create admin endpoints for rule management
Add Prometheus metrics for monitoring
Build unit and integration tests

License

TBD