# Knowledge Base RAG System (md-kb-rag) ## Overview Semantic search over the entire `claude-home` documentation repo using vector embeddings. Runs on `ubuntu-manticore` (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the `claude-home` project. - **App**: [st0nefish/md-kb-rag](https://github.com/st0nefish/md-kb-rag) (Rust) - **Host**: `manticore` (10.10.0.226) - **Stack location**: `~/docker/md-kb-rag/` on manticore - **MCP endpoint**: `http://10.10.0.226:8001/mcp` - **Current state**: 132 files indexed, ~1186 vector points ## Architecture Three containers in a single Docker Compose stack: | Container | Image | Role | Port | |-----------|-------|------|------| | `md-kb-rag-kb-rag-1` | `ghcr.io/st0nefish/md-kb-rag:latest` | MCP server + indexer | 8001 | | `md-kb-rag-qdrant-1` | `qdrant/qdrant:v1.17.0` | Vector database | 6333/6334 (localhost) | | `md-kb-rag-embeddings-1` | `ghcr.io/ggml-org/llama.cpp:server-cuda` | GPU embedding server | 8080 (internal) | ### Embedding Model - **Model**: `nomic-embed-text-v2-moe` (Q8_0 quantization) - **Vector size**: 768 dimensions - **Context size**: 8192 tokens - **GPU accelerated**: NVIDIA CUDA with flash attention ### Data Flow ``` claude-home repo files → rsync to manticore → md-kb-rag index → chunks markdown → nomic-embed generates vectors → stored in Qdrant → MCP search tool queries Qdrant → returns ranked chunks to Claude ``` ## MCP Integration ### Claude Code Config Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV2/Development/claude-home` project: ```json { "kb-search": { "type": "http", "url": "http://10.10.0.226:8001/mcp", "headers": { "Authorization": "Bearer " } } } ``` ### Available MCP Tools #### `search` Semantic search across all indexed documents. Returns ranked chunks with scores. ``` query: "natural language search query" limit: 10 # max results (default 10, max 50) domain: null # optional filter type: null # optional filter tags: [] # optional tag filter ``` #### `get_document` Retrieve full raw content of a document by file path (as returned by search results). ``` path: "/data/productivity/google-workspace-cli.md" ``` ## Data Sync The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore. This is **not** a functional git clone — it's a directory with a broken `.git` that contains the repo files directly. Files must be synced manually. ### Syncing New/Updated Files ```bash # Sync a single file rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \ manticore:~/docker/md-kb-rag/data/repo/path/to/ # Sync an entire directory rsync -av /mnt/NV2/Development/claude-home/dirname/ \ manticore:~/docker/md-kb-rag/data/repo/dirname/ # Sync everything (careful — includes tmp/, .claude/, etc.) rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \ /mnt/NV2/Development/claude-home/ \ manticore:~/docker/md-kb-rag/data/repo/ ``` ### Re-indexing After Sync ```bash # Incremental (only changed/new files — fast, use this normally) ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" # Full re-index (clears state DB, re-embeds everything — slow) ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full" ``` The incremental indexer compares content hashes in a SQLite state DB (`data/state/state.db`) and only re-embeds files whose content has changed. ## Operations ### Health Check ```bash ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health" ``` ### Status (file list + Qdrant point count) ```bash ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status" ``` ### Validate Markdown (without indexing) ```bash ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate" ``` ### View Logs ```bash ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50" ``` ### Restart Stack ```bash ssh manticore "cd ~/docker/md-kb-rag && docker compose restart" ``` ## Adding New Documentation The standard workflow for adding docs to the KB: 1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/` 2. **Update the relevant CONTEXT.md** with a summary and link 3. **Rsync to manticore**: ```bash rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \ manticore:~/docker/md-kb-rag/data/repo/path/to/ rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \ manticore:~/docker/md-kb-rag/data/repo/path/to/ ``` 4. **Re-index**: ```bash ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" ``` 5. **Verify** with a search query to confirm the new content is findable ## Environment Variables (on manticore) File: `~/docker/md-kb-rag/.env` | Variable | Purpose | |----------|---------| | `MODEL_PATH` | Path to embedding model directory | | `MODEL_FILE` | Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf) | | `KB_PATH` | Path to knowledge base repo (./data/repo) | | `MCP_PORT` | MCP server port (8001) | | `MCP_BEARER_TOKEN` | Auth token for MCP endpoint | | `RUST_LOG` | Log level (info) | ## Troubleshooting ### Search returns no results - Check health: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"` - Verify files are synced: `ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md"` - Re-index: may need `--full` if state DB is out of sync ### MCP connection refused - Check container is running: `ssh manticore "docker ps | grep kb-rag"` - Check port is listening: `ssh manticore "curl -s http://localhost:8001/health"` - Restart: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"` ### Embedding server OOM / crash - The llama.cpp embedding server uses GPU memory. Check with `ssh manticore "nvidia-smi"` - Restart embeddings container: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"` ### Stale index after deleting files - Incremental indexing doesn't remove orphaned vectors for deleted files - Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"`