253 lines
10 KiB
Markdown
253 lines
10 KiB
Markdown
---
|
|
title: "Knowledge Base RAG System"
|
|
description: "Semantic search system (md-kb-rag) over claude-home docs using vector embeddings. Covers Docker stack architecture on manticore, Qdrant + nomic-embed pipeline, MCP integration, Gitea webhook auto-sync, and troubleshooting."
|
|
type: guide
|
|
domain: development
|
|
tags: [kb-rag, mcp, qdrant, embeddings, docker, gitea, semantic-search, manticore]
|
|
---
|
|
|
|
# Knowledge Base RAG System (md-kb-rag)
|
|
|
|
## Overview
|
|
Semantic search over the entire `claude-home` documentation repo using vector embeddings. Runs on `ubuntu-manticore` (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the `claude-home` project.
|
|
|
|
- **App**: [st0nefish/md-kb-rag](https://github.com/st0nefish/md-kb-rag) (Rust)
|
|
- **Host**: `manticore` (10.10.0.226)
|
|
- **Stack location**: `~/docker/md-kb-rag/` on manticore
|
|
- **MCP endpoint**: `http://10.10.0.226:8001/mcp`
|
|
- **Webhook endpoint**: `http://10.10.0.226:8001/hooks/reindex`
|
|
- **Auto-sync**: Gitea Actions workflow triggers on push to `main` (`.md` files only)
|
|
|
|
## Architecture
|
|
|
|
Three containers in a single Docker Compose stack:
|
|
|
|
| Container | Image | Role | Port |
|
|
|-----------|-------|------|------|
|
|
| `md-kb-rag-kb-rag-1` | `ghcr.io/st0nefish/md-kb-rag:latest` | MCP server + indexer | 8001 |
|
|
| `md-kb-rag-qdrant-1` | `qdrant/qdrant:v1.17.0` | Vector database | 6333/6334 (localhost) |
|
|
| `md-kb-rag-embeddings-1` | `ghcr.io/ggml-org/llama.cpp:server-cuda` | GPU embedding server | 8080 (internal) |
|
|
|
|
### Embedding Model
|
|
- **Model**: `nomic-embed-text-v2-moe` (Q8_0 quantization)
|
|
- **Vector size**: 768 dimensions
|
|
- **Context size**: 8192 tokens
|
|
- **GPU accelerated**: NVIDIA CUDA with flash attention
|
|
|
|
### Data Flow
|
|
```
|
|
push .md to main → Gitea Action → POST /hooks/reindex (HMAC-signed)
|
|
→ kb-rag: git pull → incremental index → nomic-embed generates vectors
|
|
→ stored in Qdrant → MCP search tool queries Qdrant → returns ranked chunks to Claude
|
|
```
|
|
|
|
## MCP Integration
|
|
|
|
### Claude Code Config
|
|
Registered as a user-level MCP server in `~/.claude.json` under the top-level `mcpServers` key:
|
|
|
|
```json
|
|
{
|
|
"kb-search": {
|
|
"type": "url",
|
|
"url": "http://10.10.0.226:8001/mcp",
|
|
"headers": {
|
|
"Authorization": "Bearer <MCP_BEARER_TOKEN from .env>"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
See [workstation/claude-code-config.md](../workstation/claude-code-config.md) for details on MCP server configuration.
|
|
|
|
### Available MCP Tools
|
|
|
|
#### `search`
|
|
Semantic search across all indexed documents. Returns ranked chunks with scores.
|
|
```
|
|
query: "natural language search query"
|
|
limit: 10 # max results (default 10, max 50)
|
|
domain: null # optional filter
|
|
type: null # optional filter
|
|
tags: [] # optional tag filter
|
|
```
|
|
|
|
#### `get_document`
|
|
Retrieve full raw content of a document by file path (as returned by search results).
|
|
```
|
|
path: "/data/productivity/google-workspace-cli.md"
|
|
```
|
|
|
|
## Auto-Sync Pipeline
|
|
|
|
The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore as a proper git clone of `http://10.10.0.225:3000/cal/claude-home.git`. Syncing is fully automated.
|
|
|
|
### How It Works
|
|
1. Push `.md` files to `main` branch on Gitea
|
|
2. Gitea Actions workflow (`.gitea/workflows/kb-reindex.yml`) fires
|
|
3. Workflow sends HMAC-SHA256 signed POST to `http://10.10.0.226:8001/hooks/reindex`
|
|
4. md-kb-rag receives webhook → runs `git fetch` + `git merge --ff-only` (using `GIT_PULL_TOKEN`) → runs incremental reindex
|
|
5. Only changed files are re-embedded (content hash comparison via SQLite state DB)
|
|
|
|
### Webhook Authentication
|
|
- Provider: Gitea (native format)
|
|
- Header: `x-gitea-signature` containing hex-encoded HMAC-SHA256 of the request body
|
|
- Secret: stored as `WEBHOOK_SECRET` in `.env` on manticore and as `KB_WEBHOOK_SECRET` Gitea repo secret
|
|
- Body must include `{"ref": "refs/heads/main"}` to match the configured branch
|
|
|
|
### Gitea Actions Workflow
|
|
```yaml
|
|
# .gitea/workflows/kb-reindex.yml
|
|
name: Reindex Knowledge Base
|
|
on:
|
|
push:
|
|
branches: [main]
|
|
paths: ['**/*.md']
|
|
jobs:
|
|
reindex:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Trigger KB re-index
|
|
env:
|
|
WEBHOOK_SECRET: ${{ secrets.KB_WEBHOOK_SECRET }}
|
|
run: |
|
|
BODY='{"ref":"refs/heads/main"}'
|
|
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" | awk '{print $2}')
|
|
curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \
|
|
-H "Content-Type: application/json" \
|
|
-H "x-gitea-signature: $SIG" \
|
|
-d "$BODY"
|
|
```
|
|
|
|
### Manual Re-indexing
|
|
```bash
|
|
# Incremental (only changed/new files — fast, use this normally)
|
|
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
|
|
|
|
# Full re-index (clears state DB, re-embeds everything — slow)
|
|
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"
|
|
```
|
|
|
|
### Manual Webhook Test
|
|
```bash
|
|
BODY='{"ref":"refs/heads/main"}'
|
|
SECRET='<webhook-secret>'
|
|
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')
|
|
curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \
|
|
-H "Content-Type: application/json" \
|
|
-H "x-gitea-signature: $SIG" \
|
|
-d "$BODY"
|
|
```
|
|
|
|
## Operations
|
|
|
|
### Health Check
|
|
```bash
|
|
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"
|
|
```
|
|
|
|
### Status (file list + Qdrant point count)
|
|
```bash
|
|
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status"
|
|
```
|
|
|
|
### Validate Markdown (without indexing)
|
|
```bash
|
|
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate"
|
|
```
|
|
|
|
### View Logs
|
|
```bash
|
|
ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50"
|
|
```
|
|
|
|
### Restart Stack
|
|
```bash
|
|
ssh manticore "cd ~/docker/md-kb-rag && docker compose restart"
|
|
```
|
|
|
|
## Adding New Documentation
|
|
|
|
1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/`
|
|
2. **Update the relevant CONTEXT.md** with a summary and link
|
|
3. **Commit and push** to `main` — the pipeline handles the rest
|
|
4. **Verify** with a `kb-search` MCP search query to confirm the new content is findable
|
|
|
|
## Environment Variables (on manticore)
|
|
|
|
File: `~/docker/md-kb-rag/.env`
|
|
|
|
| Variable | Purpose |
|
|
|----------|---------|
|
|
| `MODEL_PATH` | Path to embedding model directory |
|
|
| `MODEL_FILE` | Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf) |
|
|
| `KB_PATH` | Path to knowledge base repo (./data/repo) |
|
|
| `MCP_PORT` | MCP server port (8001) |
|
|
| `MCP_BEARER_TOKEN` | Auth token for MCP endpoint |
|
|
| `WEBHOOK_SECRET` | HMAC secret for webhook auth (shared with Gitea repo secret) |
|
|
| `GIT_PULL_TOKEN` | Gitea token for authenticated git fetch during webhook reindex |
|
|
| `RUST_LOG` | Log level (info) |
|
|
|
|
## Troubleshooting
|
|
|
|
### Search returns no results
|
|
- Check health: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"`
|
|
- Verify files are synced: `ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md"`
|
|
- Re-index: may need `--full` if state DB is out of sync
|
|
|
|
### MCP connection refused
|
|
- Check container is running: `ssh manticore "docker ps | grep kb-rag"`
|
|
- Check port is listening: `ssh manticore "curl -s http://localhost:8001/health"`
|
|
- Restart: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"`
|
|
|
|
### Embedding server OOM / crash
|
|
- The llama.cpp embedding server uses GPU memory. Check with `ssh manticore "nvidia-smi"`
|
|
- Restart embeddings container: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"`
|
|
|
|
### Stale index after deleting files
|
|
- Incremental indexing doesn't remove orphaned vectors for deleted files
|
|
- Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"`
|
|
|
|
### Webhook returns 500 "Git pull failed"
|
|
- Check container logs: `ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 20"`
|
|
- **"dubious ownership"**: The `.gitconfig` with `safe.directory = /data` isn't mounted or `GIT_CONFIG_GLOBAL` env var is missing
|
|
- **"Permission denied"**: Container must run as `user: "1000:1000"` to match repo file ownership
|
|
- **"FETCH_HEAD" error**: Same as permission denied — uid mismatch
|
|
|
|
### MCP session disconnects after container restart
|
|
- Run `/mcp` in Claude Code to reconnect
|
|
- This happens because the Streamable HTTP session is invalidated when the container restarts
|
|
|
|
## Docker Compose Notes
|
|
|
|
The kb-rag service has these non-obvious requirements:
|
|
- `user: "1000:1000"` — must match the uid/gid that owns `data/repo/` for git pull to work
|
|
- `config.yaml` mount — provides `source.git_url` and `branch` so the webhook handler knows to run `git pull`
|
|
- `.gitconfig` mount + `GIT_CONFIG_GLOBAL` env var — git needs `safe.directory = /data` since the volume owner differs from the container's default user
|
|
|
|
## Changelog
|
|
|
|
### 2026-03-17 — Image Update + Config Fixes
|
|
|
|
**Image pull**: Updated `ghcr.io/st0nefish/md-kb-rag:latest` (8 upstream commits since initial deploy on 2026-03-11).
|
|
|
|
**Key upstream changes applied:**
|
|
- **`GIT_PULL_TOKEN` support** — Webhook-triggered reindex now uses explicit `git fetch` + `git merge --ff-only` with a token injected into the HTTPS URL. Previously the git pull inside Docker was silently failing (no SSH client, dubious ownership errors).
|
|
- **Auto-clone on startup** — Setting `source.git_url` allows the container to shallow-clone the repo into an empty volume on first boot. Not adopted (we use a bind-mount), but available.
|
|
- **`EMBEDDING_API_KEY` support** — Optional env var for authenticated embedding providers. Not needed for local llama.cpp.
|
|
- **Custom MCP instructions** — New `mcp.instructions` config field sets the server-level instructions block sent to MCP clients. Server auto-appends discovered filter metadata (domains, types, tags).
|
|
- **Bug fixes** — Webhook rate limiter gap, globset deny-all fallback, RwLock panic in MCP startup, HTTP 429/503 retry logic for embedding API.
|
|
|
|
**Config changes made:**
|
|
- Added `GIT_PULL_TOKEN` env var to `.env` (Gitea token with repo read access)
|
|
- Added `GIT_PULL_TOKEN=${GIT_PULL_TOKEN:-}` to `docker-compose.yml` environment section
|
|
- Added `mcp.instructions` to `config.yaml` with proactive search trigger keywords matching the claude-home topic areas
|
|
|
|
**Env vars table update:**
|
|
|
|
| Variable | Purpose |
|
|
|----------|---------|
|
|
| `GIT_PULL_TOKEN` | Gitea token for authenticated git fetch during webhook reindex |
|
|
|
|
**Result**: Webhook reindex pipeline now works end-to-end (push → Gitea Action → webhook → git fetch with auth → incremental reindex). Verified with live push test.
|