docs: add kb-rag system documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 07:40:38 -05:00 · 2026-03-12 07:40:38 -05:00 · 747e4c2cce
commit 747e4c2cce
parent 2897d1f037
1 changed files with 178 additions and 0 deletions
--- a/development/kb-rag-system.md
+++ b/development/kb-rag-system.md
@ -0,0 +1,178 @@
+# Knowledge Base RAG System (md-kb-rag)
+
+## Overview
+Semantic search over the entire `claude-home` documentation repo using vector embeddings. Runs on `ubuntu-manticore` (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the `claude-home` project.
+
+- **App**: [st0nefish/md-kb-rag](https://github.com/st0nefish/md-kb-rag) (Rust)
+- **Host**: `manticore` (10.10.0.226)
+- **Stack location**: `~/docker/md-kb-rag/` on manticore
+- **MCP endpoint**: `http://10.10.0.226:8001/mcp`
+- **Current state**: 132 files indexed, ~1186 vector points
+
+## Architecture
+
+Three containers in a single Docker Compose stack:
+
+| Container | Image | Role | Port |
+|-----------|-------|------|------|
+| `md-kb-rag-kb-rag-1` | `ghcr.io/st0nefish/md-kb-rag:latest` | MCP server + indexer | 8001 |
+| `md-kb-rag-qdrant-1` | `qdrant/qdrant:v1.17.0` | Vector database | 6333/6334 (localhost) |
+| `md-kb-rag-embeddings-1` | `ghcr.io/ggml-org/llama.cpp:server-cuda` | GPU embedding server | 8080 (internal) |
+
+### Embedding Model
+- **Model**: `nomic-embed-text-v2-moe` (Q8_0 quantization)
+- **Vector size**: 768 dimensions
+- **Context size**: 8192 tokens
+- **GPU accelerated**: NVIDIA CUDA with flash attention
+
+### Data Flow
+```
+claude-home repo files → rsync to manticore → md-kb-rag index
+  → chunks markdown → nomic-embed generates vectors → stored in Qdrant
+  → MCP search tool queries Qdrant → returns ranked chunks to Claude
+```
+
+## MCP Integration
+
+### Claude Code Config
+Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV2/Development/claude-home` project:
+
+```json
+{
+  "kb-search": {
+    "type": "http",
+    "url": "http://10.10.0.226:8001/mcp",
+    "headers": {
+      "Authorization": "Bearer <MCP_BEARER_TOKEN from .env>"
+    }
+  }
+}
+```
+
+### Available MCP Tools
+
+#### `search`
+Semantic search across all indexed documents. Returns ranked chunks with scores.
+```
+query: "natural language search query"
+limit: 10          # max results (default 10, max 50)
+domain: null       # optional filter
+type: null         # optional filter
+tags: []           # optional tag filter
+```
+
+#### `get_document`
+Retrieve full raw content of a document by file path (as returned by search results).
+```
+path: "/data/productivity/google-workspace-cli.md"
+```
+
+## Data Sync
+
+The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore. This is **not** a functional git clone — it's a directory with a broken `.git` that contains the repo files directly. Files must be synced manually.
+
+### Syncing New/Updated Files
+```bash
+# Sync a single file
+rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \
+  manticore:~/docker/md-kb-rag/data/repo/path/to/
+
+# Sync an entire directory
+rsync -av /mnt/NV2/Development/claude-home/dirname/ \
+  manticore:~/docker/md-kb-rag/data/repo/dirname/
+
+# Sync everything (careful — includes tmp/, .claude/, etc.)
+rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \
+  /mnt/NV2/Development/claude-home/ \
+  manticore:~/docker/md-kb-rag/data/repo/
+```
+
+### Re-indexing After Sync
+```bash
+# Incremental (only changed/new files — fast, use this normally)
+ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
+
+# Full re-index (clears state DB, re-embeds everything — slow)
+ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"
+```
+
+The incremental indexer compares content hashes in a SQLite state DB (`data/state/state.db`) and only re-embeds files whose content has changed.
+
+## Operations
+
+### Health Check
+```bash
+ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"
+```
+
+### Status (file list + Qdrant point count)
+```bash
+ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status"
+```
+
+### Validate Markdown (without indexing)
+```bash
+ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate"
+```
+
+### View Logs
+```bash
+ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50"
+```
+
+### Restart Stack
+```bash
+ssh manticore "cd ~/docker/md-kb-rag && docker compose restart"
+```
+
+## Adding New Documentation
+
+The standard workflow for adding docs to the KB:
+
+1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/`
+2. **Update the relevant CONTEXT.md** with a summary and link
+3. **Rsync to manticore**:
+   ```bash
+   rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \
+     manticore:~/docker/md-kb-rag/data/repo/path/to/
+   rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \
+     manticore:~/docker/md-kb-rag/data/repo/path/to/
+   ```
+4. **Re-index**:
+   ```bash
+   ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
+   ```
+5. **Verify** with a search query to confirm the new content is findable
+
+## Environment Variables (on manticore)
+
+File: `~/docker/md-kb-rag/.env`
+
+| Variable | Purpose |
+|----------|---------|
+| `MODEL_PATH` | Path to embedding model directory |
+| `MODEL_FILE` | Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf) |
+| `KB_PATH` | Path to knowledge base repo (./data/repo) |
+| `MCP_PORT` | MCP server port (8001) |
+| `MCP_BEARER_TOKEN` | Auth token for MCP endpoint |
+| `RUST_LOG` | Log level (info) |
+
+## Troubleshooting
+
+### Search returns no results
+- Check health: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"`
+- Verify files are synced: `ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md"`
+- Re-index: may need `--full` if state DB is out of sync
+
+### MCP connection refused
+- Check container is running: `ssh manticore "docker ps | grep kb-rag"`
+- Check port is listening: `ssh manticore "curl -s http://localhost:8001/health"`
+- Restart: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"`
+
+### Embedding server OOM / crash
+- The llama.cpp embedding server uses GPU memory. Check with `ssh manticore "nvidia-smi"`
+- Restart embeddings container: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"`
+
+### Stale index after deleting files
+- Incremental indexing doesn't remove orphaned vectors for deleted files
+- Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"`