From 747e4c2cce1127f87247250181d77456c7ed47ca Mon Sep 17 00:00:00 2001 From: Cal Corum Date: Thu, 12 Mar 2026 07:40:38 -0500 Subject: [PATCH] docs: add kb-rag system documentation Co-Authored-By: Claude Opus 4.6 --- development/kb-rag-system.md | 178 +++++++++++++++++++++++++++++++++++ 1 file changed, 178 insertions(+) create mode 100644 development/kb-rag-system.md diff --git a/development/kb-rag-system.md b/development/kb-rag-system.md new file mode 100644 index 0000000..f15e23b --- /dev/null +++ b/development/kb-rag-system.md @@ -0,0 +1,178 @@ +# Knowledge Base RAG System (md-kb-rag) + +## Overview +Semantic search over the entire `claude-home` documentation repo using vector embeddings. Runs on `ubuntu-manticore` (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the `claude-home` project. + +- **App**: [st0nefish/md-kb-rag](https://github.com/st0nefish/md-kb-rag) (Rust) +- **Host**: `manticore` (10.10.0.226) +- **Stack location**: `~/docker/md-kb-rag/` on manticore +- **MCP endpoint**: `http://10.10.0.226:8001/mcp` +- **Current state**: 132 files indexed, ~1186 vector points + +## Architecture + +Three containers in a single Docker Compose stack: + +| Container | Image | Role | Port | +|-----------|-------|------|------| +| `md-kb-rag-kb-rag-1` | `ghcr.io/st0nefish/md-kb-rag:latest` | MCP server + indexer | 8001 | +| `md-kb-rag-qdrant-1` | `qdrant/qdrant:v1.17.0` | Vector database | 6333/6334 (localhost) | +| `md-kb-rag-embeddings-1` | `ghcr.io/ggml-org/llama.cpp:server-cuda` | GPU embedding server | 8080 (internal) | + +### Embedding Model +- **Model**: `nomic-embed-text-v2-moe` (Q8_0 quantization) +- **Vector size**: 768 dimensions +- **Context size**: 8192 tokens +- **GPU accelerated**: NVIDIA CUDA with flash attention + +### Data Flow +``` +claude-home repo files → rsync to manticore → md-kb-rag index + → chunks markdown → nomic-embed generates vectors → stored in Qdrant + → MCP search tool queries Qdrant → returns ranked chunks to Claude +``` + +## MCP Integration + +### Claude Code Config +Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV2/Development/claude-home` project: + +```json +{ + "kb-search": { + "type": "http", + "url": "http://10.10.0.226:8001/mcp", + "headers": { + "Authorization": "Bearer " + } + } +} +``` + +### Available MCP Tools + +#### `search` +Semantic search across all indexed documents. Returns ranked chunks with scores. +``` +query: "natural language search query" +limit: 10 # max results (default 10, max 50) +domain: null # optional filter +type: null # optional filter +tags: [] # optional tag filter +``` + +#### `get_document` +Retrieve full raw content of a document by file path (as returned by search results). +``` +path: "/data/productivity/google-workspace-cli.md" +``` + +## Data Sync + +The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore. This is **not** a functional git clone — it's a directory with a broken `.git` that contains the repo files directly. Files must be synced manually. + +### Syncing New/Updated Files +```bash +# Sync a single file +rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \ + manticore:~/docker/md-kb-rag/data/repo/path/to/ + +# Sync an entire directory +rsync -av /mnt/NV2/Development/claude-home/dirname/ \ + manticore:~/docker/md-kb-rag/data/repo/dirname/ + +# Sync everything (careful — includes tmp/, .claude/, etc.) +rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \ + /mnt/NV2/Development/claude-home/ \ + manticore:~/docker/md-kb-rag/data/repo/ +``` + +### Re-indexing After Sync +```bash +# Incremental (only changed/new files — fast, use this normally) +ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" + +# Full re-index (clears state DB, re-embeds everything — slow) +ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full" +``` + +The incremental indexer compares content hashes in a SQLite state DB (`data/state/state.db`) and only re-embeds files whose content has changed. + +## Operations + +### Health Check +```bash +ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health" +``` + +### Status (file list + Qdrant point count) +```bash +ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status" +``` + +### Validate Markdown (without indexing) +```bash +ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate" +``` + +### View Logs +```bash +ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50" +``` + +### Restart Stack +```bash +ssh manticore "cd ~/docker/md-kb-rag && docker compose restart" +``` + +## Adding New Documentation + +The standard workflow for adding docs to the KB: + +1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/` +2. **Update the relevant CONTEXT.md** with a summary and link +3. **Rsync to manticore**: + ```bash + rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \ + manticore:~/docker/md-kb-rag/data/repo/path/to/ + rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \ + manticore:~/docker/md-kb-rag/data/repo/path/to/ + ``` +4. **Re-index**: + ```bash + ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" + ``` +5. **Verify** with a search query to confirm the new content is findable + +## Environment Variables (on manticore) + +File: `~/docker/md-kb-rag/.env` + +| Variable | Purpose | +|----------|---------| +| `MODEL_PATH` | Path to embedding model directory | +| `MODEL_FILE` | Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf) | +| `KB_PATH` | Path to knowledge base repo (./data/repo) | +| `MCP_PORT` | MCP server port (8001) | +| `MCP_BEARER_TOKEN` | Auth token for MCP endpoint | +| `RUST_LOG` | Log level (info) | + +## Troubleshooting + +### Search returns no results +- Check health: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"` +- Verify files are synced: `ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md"` +- Re-index: may need `--full` if state DB is out of sync + +### MCP connection refused +- Check container is running: `ssh manticore "docker ps | grep kb-rag"` +- Check port is listening: `ssh manticore "curl -s http://localhost:8001/health"` +- Restart: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"` + +### Embedding server OOM / crash +- The llama.cpp embedding server uses GPU memory. Check with `ssh manticore "nvidia-smi"` +- Restart embeddings container: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"` + +### Stale index after deleting files +- Incremental indexing doesn't remove orphaned vectors for deleted files +- Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"`