claude-home/development/kb-rag-system.md
Cal Corum 4b7eca8a46
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s
docs: add YAML frontmatter to all 151 markdown files
Adds title, description, type, domain, and tags frontmatter to every
doc for improved KB semantic search. The description field is prepended
to every search chunk, and domain/type/tags enable filtered queries.

Type values: context, guide, runbook, reference, troubleshooting
Domain values match directory structure (networking, docker, etc.)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 09:00:44 -05:00

226 lines
8.3 KiB
Markdown

---
title: "Knowledge Base RAG System"
description: "Semantic search system (md-kb-rag) over claude-home docs using vector embeddings. Covers Docker stack architecture on manticore, Qdrant + nomic-embed pipeline, MCP integration, Gitea webhook auto-sync, and troubleshooting."
type: guide
domain: development
tags: [kb-rag, mcp, qdrant, embeddings, docker, gitea, semantic-search, manticore]
---
# Knowledge Base RAG System (md-kb-rag)
## Overview
Semantic search over the entire `claude-home` documentation repo using vector embeddings. Runs on `ubuntu-manticore` (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the `claude-home` project.
- **App**: [st0nefish/md-kb-rag](https://github.com/st0nefish/md-kb-rag) (Rust)
- **Host**: `manticore` (10.10.0.226)
- **Stack location**: `~/docker/md-kb-rag/` on manticore
- **MCP endpoint**: `http://10.10.0.226:8001/mcp`
- **Webhook endpoint**: `http://10.10.0.226:8001/hooks/reindex`
- **Auto-sync**: Gitea Actions workflow triggers on push to `main` (`.md` files only)
## Architecture
Three containers in a single Docker Compose stack:
| Container | Image | Role | Port |
|-----------|-------|------|------|
| `md-kb-rag-kb-rag-1` | `ghcr.io/st0nefish/md-kb-rag:latest` | MCP server + indexer | 8001 |
| `md-kb-rag-qdrant-1` | `qdrant/qdrant:v1.17.0` | Vector database | 6333/6334 (localhost) |
| `md-kb-rag-embeddings-1` | `ghcr.io/ggml-org/llama.cpp:server-cuda` | GPU embedding server | 8080 (internal) |
### Embedding Model
- **Model**: `nomic-embed-text-v2-moe` (Q8_0 quantization)
- **Vector size**: 768 dimensions
- **Context size**: 8192 tokens
- **GPU accelerated**: NVIDIA CUDA with flash attention
### Data Flow
```
push .md to main → Gitea Action → POST /hooks/reindex (HMAC-signed)
→ kb-rag: git pull → incremental index → nomic-embed generates vectors
→ stored in Qdrant → MCP search tool queries Qdrant → returns ranked chunks to Claude
```
## MCP Integration
### Claude Code Config
Registered as a user-level MCP server in `~/.claude.json` under the top-level `mcpServers` key:
```json
{
"kb-search": {
"type": "url",
"url": "http://10.10.0.226:8001/mcp",
"headers": {
"Authorization": "Bearer <MCP_BEARER_TOKEN from .env>"
}
}
}
```
See [workstation/claude-code-config.md](../workstation/claude-code-config.md) for details on MCP server configuration.
### Available MCP Tools
#### `search`
Semantic search across all indexed documents. Returns ranked chunks with scores.
```
query: "natural language search query"
limit: 10 # max results (default 10, max 50)
domain: null # optional filter
type: null # optional filter
tags: [] # optional tag filter
```
#### `get_document`
Retrieve full raw content of a document by file path (as returned by search results).
```
path: "/data/productivity/google-workspace-cli.md"
```
## Auto-Sync Pipeline
The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore as a proper git clone of `http://10.10.0.225:3000/cal/claude-home.git`. Syncing is fully automated.
### How It Works
1. Push `.md` files to `main` branch on Gitea
2. Gitea Actions workflow (`.gitea/workflows/kb-reindex.yml`) fires
3. Workflow sends HMAC-SHA256 signed POST to `http://10.10.0.226:8001/hooks/reindex`
4. md-kb-rag receives webhook → runs `git pull --ff-only` → runs incremental reindex
5. Only changed files are re-embedded (content hash comparison via SQLite state DB)
### Webhook Authentication
- Provider: Gitea (native format)
- Header: `x-gitea-signature` containing hex-encoded HMAC-SHA256 of the request body
- Secret: stored as `WEBHOOK_SECRET` in `.env` on manticore and as `KB_WEBHOOK_SECRET` Gitea repo secret
- Body must include `{"ref": "refs/heads/main"}` to match the configured branch
### Gitea Actions Workflow
```yaml
# .gitea/workflows/kb-reindex.yml
name: Reindex Knowledge Base
on:
push:
branches: [main]
paths: ['**/*.md']
jobs:
reindex:
runs-on: ubuntu-latest
steps:
- name: Trigger KB re-index
env:
WEBHOOK_SECRET: ${{ secrets.KB_WEBHOOK_SECRET }}
run: |
BODY='{"ref":"refs/heads/main"}'
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" | awk '{print $2}')
curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \
-H "Content-Type: application/json" \
-H "x-gitea-signature: $SIG" \
-d "$BODY"
```
### Manual Re-indexing
```bash
# Incremental (only changed/new files — fast, use this normally)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
# Full re-index (clears state DB, re-embeds everything — slow)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"
```
### Manual Webhook Test
```bash
BODY='{"ref":"refs/heads/main"}'
SECRET='<webhook-secret>'
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')
curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \
-H "Content-Type: application/json" \
-H "x-gitea-signature: $SIG" \
-d "$BODY"
```
## Operations
### Health Check
```bash
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"
```
### Status (file list + Qdrant point count)
```bash
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status"
```
### Validate Markdown (without indexing)
```bash
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate"
```
### View Logs
```bash
ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50"
```
### Restart Stack
```bash
ssh manticore "cd ~/docker/md-kb-rag && docker compose restart"
```
## Adding New Documentation
1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/`
2. **Update the relevant CONTEXT.md** with a summary and link
3. **Commit and push** to `main` — the pipeline handles the rest
4. **Verify** with a `kb-search` MCP search query to confirm the new content is findable
## Environment Variables (on manticore)
File: `~/docker/md-kb-rag/.env`
| Variable | Purpose |
|----------|---------|
| `MODEL_PATH` | Path to embedding model directory |
| `MODEL_FILE` | Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf) |
| `KB_PATH` | Path to knowledge base repo (./data/repo) |
| `MCP_PORT` | MCP server port (8001) |
| `MCP_BEARER_TOKEN` | Auth token for MCP endpoint |
| `WEBHOOK_SECRET` | HMAC secret for webhook auth (shared with Gitea repo secret) |
| `RUST_LOG` | Log level (info) |
## Troubleshooting
### Search returns no results
- Check health: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"`
- Verify files are synced: `ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md"`
- Re-index: may need `--full` if state DB is out of sync
### MCP connection refused
- Check container is running: `ssh manticore "docker ps | grep kb-rag"`
- Check port is listening: `ssh manticore "curl -s http://localhost:8001/health"`
- Restart: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"`
### Embedding server OOM / crash
- The llama.cpp embedding server uses GPU memory. Check with `ssh manticore "nvidia-smi"`
- Restart embeddings container: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"`
### Stale index after deleting files
- Incremental indexing doesn't remove orphaned vectors for deleted files
- Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"`
### Webhook returns 500 "Git pull failed"
- Check container logs: `ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 20"`
- **"dubious ownership"**: The `.gitconfig` with `safe.directory = /data` isn't mounted or `GIT_CONFIG_GLOBAL` env var is missing
- **"Permission denied"**: Container must run as `user: "1000:1000"` to match repo file ownership
- **"FETCH_HEAD" error**: Same as permission denied — uid mismatch
### MCP session disconnects after container restart
- Run `/mcp` in Claude Code to reconnect
- This happens because the Streamable HTTP session is invalidated when the container restarts
## Docker Compose Notes
The kb-rag service has these non-obvious requirements:
- `user: "1000:1000"` — must match the uid/gid that owns `data/repo/` for git pull to work
- `config.yaml` mount — provides `source.git_url` and `branch` so the webhook handler knows to run `git pull`
- `.gitconfig` mount + `GIT_CONFIG_GLOBAL` env var — git needs `safe.directory = /data` since the volume owner differs from the container's default user