claude-home/development/kb-rag-system.md
Cal Corum 747e4c2cce
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 6s
docs: add kb-rag system documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 07:40:38 -05:00

6.0 KiB

Knowledge Base RAG System (md-kb-rag)

Overview

Semantic search over the entire claude-home documentation repo using vector embeddings. Runs on ubuntu-manticore (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the claude-home project.

  • App: st0nefish/md-kb-rag (Rust)
  • Host: manticore (10.10.0.226)
  • Stack location: ~/docker/md-kb-rag/ on manticore
  • MCP endpoint: http://10.10.0.226:8001/mcp
  • Current state: 132 files indexed, ~1186 vector points

Architecture

Three containers in a single Docker Compose stack:

Container Image Role Port
md-kb-rag-kb-rag-1 ghcr.io/st0nefish/md-kb-rag:latest MCP server + indexer 8001
md-kb-rag-qdrant-1 qdrant/qdrant:v1.17.0 Vector database 6333/6334 (localhost)
md-kb-rag-embeddings-1 ghcr.io/ggml-org/llama.cpp:server-cuda GPU embedding server 8080 (internal)

Embedding Model

  • Model: nomic-embed-text-v2-moe (Q8_0 quantization)
  • Vector size: 768 dimensions
  • Context size: 8192 tokens
  • GPU accelerated: NVIDIA CUDA with flash attention

Data Flow

claude-home repo files → rsync to manticore → md-kb-rag index
  → chunks markdown → nomic-embed generates vectors → stored in Qdrant
  → MCP search tool queries Qdrant → returns ranked chunks to Claude

MCP Integration

Claude Code Config

Registered as a project-scoped MCP server in ~/.claude.json under the /mnt/NV2/Development/claude-home project:

{
  "kb-search": {
    "type": "http",
    "url": "http://10.10.0.226:8001/mcp",
    "headers": {
      "Authorization": "Bearer <MCP_BEARER_TOKEN from .env>"
    }
  }
}

Available MCP Tools

Semantic search across all indexed documents. Returns ranked chunks with scores.

query: "natural language search query"
limit: 10          # max results (default 10, max 50)
domain: null       # optional filter
type: null         # optional filter
tags: []           # optional tag filter

get_document

Retrieve full raw content of a document by file path (as returned by search results).

path: "/data/productivity/google-workspace-cli.md"

Data Sync

The KB data lives at ~/docker/md-kb-rag/data/repo/ on manticore. This is not a functional git clone — it's a directory with a broken .git that contains the repo files directly. Files must be synced manually.

Syncing New/Updated Files

# Sync a single file
rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \
  manticore:~/docker/md-kb-rag/data/repo/path/to/

# Sync an entire directory
rsync -av /mnt/NV2/Development/claude-home/dirname/ \
  manticore:~/docker/md-kb-rag/data/repo/dirname/

# Sync everything (careful — includes tmp/, .claude/, etc.)
rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \
  /mnt/NV2/Development/claude-home/ \
  manticore:~/docker/md-kb-rag/data/repo/

Re-indexing After Sync

# Incremental (only changed/new files — fast, use this normally)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"

# Full re-index (clears state DB, re-embeds everything — slow)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"

The incremental indexer compares content hashes in a SQLite state DB (data/state/state.db) and only re-embeds files whose content has changed.

Operations

Health Check

ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"

Status (file list + Qdrant point count)

ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status"

Validate Markdown (without indexing)

ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate"

View Logs

ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50"

Restart Stack

ssh manticore "cd ~/docker/md-kb-rag && docker compose restart"

Adding New Documentation

The standard workflow for adding docs to the KB:

  1. Create/edit the markdown file in /mnt/NV2/Development/claude-home/
  2. Update the relevant CONTEXT.md with a summary and link
  3. Rsync to manticore:
    rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \
      manticore:~/docker/md-kb-rag/data/repo/path/to/
    rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \
      manticore:~/docker/md-kb-rag/data/repo/path/to/
    
  4. Re-index:
    ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
    
  5. Verify with a search query to confirm the new content is findable

Environment Variables (on manticore)

File: ~/docker/md-kb-rag/.env

Variable Purpose
MODEL_PATH Path to embedding model directory
MODEL_FILE Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf)
KB_PATH Path to knowledge base repo (./data/repo)
MCP_PORT MCP server port (8001)
MCP_BEARER_TOKEN Auth token for MCP endpoint
RUST_LOG Log level (info)

Troubleshooting

Search returns no results

  • Check health: ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"
  • Verify files are synced: ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md"
  • Re-index: may need --full if state DB is out of sync

MCP connection refused

  • Check container is running: ssh manticore "docker ps | grep kb-rag"
  • Check port is listening: ssh manticore "curl -s http://localhost:8001/health"
  • Restart: ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"

Embedding server OOM / crash

  • The llama.cpp embedding server uses GPU memory. Check with ssh manticore "nvidia-smi"
  • Restart embeddings container: ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"

Stale index after deleting files

  • Incremental indexing doesn't remove orphaned vectors for deleted files
  • Run --full re-index to clean up: ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"