Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6.0 KiB
Knowledge Base RAG System (md-kb-rag)
Overview
Semantic search over the entire claude-home documentation repo using vector embeddings. Runs on ubuntu-manticore (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the claude-home project.
- App: st0nefish/md-kb-rag (Rust)
- Host:
manticore(10.10.0.226) - Stack location:
~/docker/md-kb-rag/on manticore - MCP endpoint:
http://10.10.0.226:8001/mcp - Current state: 132 files indexed, ~1186 vector points
Architecture
Three containers in a single Docker Compose stack:
| Container | Image | Role | Port |
|---|---|---|---|
md-kb-rag-kb-rag-1 |
ghcr.io/st0nefish/md-kb-rag:latest |
MCP server + indexer | 8001 |
md-kb-rag-qdrant-1 |
qdrant/qdrant:v1.17.0 |
Vector database | 6333/6334 (localhost) |
md-kb-rag-embeddings-1 |
ghcr.io/ggml-org/llama.cpp:server-cuda |
GPU embedding server | 8080 (internal) |
Embedding Model
- Model:
nomic-embed-text-v2-moe(Q8_0 quantization) - Vector size: 768 dimensions
- Context size: 8192 tokens
- GPU accelerated: NVIDIA CUDA with flash attention
Data Flow
claude-home repo files → rsync to manticore → md-kb-rag index
→ chunks markdown → nomic-embed generates vectors → stored in Qdrant
→ MCP search tool queries Qdrant → returns ranked chunks to Claude
MCP Integration
Claude Code Config
Registered as a project-scoped MCP server in ~/.claude.json under the /mnt/NV2/Development/claude-home project:
{
"kb-search": {
"type": "http",
"url": "http://10.10.0.226:8001/mcp",
"headers": {
"Authorization": "Bearer <MCP_BEARER_TOKEN from .env>"
}
}
}
Available MCP Tools
search
Semantic search across all indexed documents. Returns ranked chunks with scores.
query: "natural language search query"
limit: 10 # max results (default 10, max 50)
domain: null # optional filter
type: null # optional filter
tags: [] # optional tag filter
get_document
Retrieve full raw content of a document by file path (as returned by search results).
path: "/data/productivity/google-workspace-cli.md"
Data Sync
The KB data lives at ~/docker/md-kb-rag/data/repo/ on manticore. This is not a functional git clone — it's a directory with a broken .git that contains the repo files directly. Files must be synced manually.
Syncing New/Updated Files
# Sync a single file
rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \
manticore:~/docker/md-kb-rag/data/repo/path/to/
# Sync an entire directory
rsync -av /mnt/NV2/Development/claude-home/dirname/ \
manticore:~/docker/md-kb-rag/data/repo/dirname/
# Sync everything (careful — includes tmp/, .claude/, etc.)
rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \
/mnt/NV2/Development/claude-home/ \
manticore:~/docker/md-kb-rag/data/repo/
Re-indexing After Sync
# Incremental (only changed/new files — fast, use this normally)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
# Full re-index (clears state DB, re-embeds everything — slow)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"
The incremental indexer compares content hashes in a SQLite state DB (data/state/state.db) and only re-embeds files whose content has changed.
Operations
Health Check
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"
Status (file list + Qdrant point count)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status"
Validate Markdown (without indexing)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate"
View Logs
ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50"
Restart Stack
ssh manticore "cd ~/docker/md-kb-rag && docker compose restart"
Adding New Documentation
The standard workflow for adding docs to the KB:
- Create/edit the markdown file in
/mnt/NV2/Development/claude-home/ - Update the relevant CONTEXT.md with a summary and link
- Rsync to manticore:
rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \ manticore:~/docker/md-kb-rag/data/repo/path/to/ rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \ manticore:~/docker/md-kb-rag/data/repo/path/to/ - Re-index:
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" - Verify with a search query to confirm the new content is findable
Environment Variables (on manticore)
File: ~/docker/md-kb-rag/.env
| Variable | Purpose |
|---|---|
MODEL_PATH |
Path to embedding model directory |
MODEL_FILE |
Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf) |
KB_PATH |
Path to knowledge base repo (./data/repo) |
MCP_PORT |
MCP server port (8001) |
MCP_BEARER_TOKEN |
Auth token for MCP endpoint |
RUST_LOG |
Log level (info) |
Troubleshooting
Search returns no results
- Check health:
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health" - Verify files are synced:
ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md" - Re-index: may need
--fullif state DB is out of sync
MCP connection refused
- Check container is running:
ssh manticore "docker ps | grep kb-rag" - Check port is listening:
ssh manticore "curl -s http://localhost:8001/health" - Restart:
ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"
Embedding server OOM / crash
- The llama.cpp embedding server uses GPU memory. Check with
ssh manticore "nvidia-smi" - Restart embeddings container:
ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"
Stale index after deleting files
- Incremental indexing doesn't remove orphaned vectors for deleted files
- Run
--fullre-index to clean up:ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"