docs: add kb-rag system documentation
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 6s
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 6s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
2897d1f037
commit
747e4c2cce
178
development/kb-rag-system.md
Normal file
178
development/kb-rag-system.md
Normal file
@ -0,0 +1,178 @@
|
||||
# Knowledge Base RAG System (md-kb-rag)
|
||||
|
||||
## Overview
|
||||
Semantic search over the entire `claude-home` documentation repo using vector embeddings. Runs on `ubuntu-manticore` (10.10.0.226) as a Docker stack, exposed as an MCP server to Claude Code for the `claude-home` project.
|
||||
|
||||
- **App**: [st0nefish/md-kb-rag](https://github.com/st0nefish/md-kb-rag) (Rust)
|
||||
- **Host**: `manticore` (10.10.0.226)
|
||||
- **Stack location**: `~/docker/md-kb-rag/` on manticore
|
||||
- **MCP endpoint**: `http://10.10.0.226:8001/mcp`
|
||||
- **Current state**: 132 files indexed, ~1186 vector points
|
||||
|
||||
## Architecture
|
||||
|
||||
Three containers in a single Docker Compose stack:
|
||||
|
||||
| Container | Image | Role | Port |
|
||||
|-----------|-------|------|------|
|
||||
| `md-kb-rag-kb-rag-1` | `ghcr.io/st0nefish/md-kb-rag:latest` | MCP server + indexer | 8001 |
|
||||
| `md-kb-rag-qdrant-1` | `qdrant/qdrant:v1.17.0` | Vector database | 6333/6334 (localhost) |
|
||||
| `md-kb-rag-embeddings-1` | `ghcr.io/ggml-org/llama.cpp:server-cuda` | GPU embedding server | 8080 (internal) |
|
||||
|
||||
### Embedding Model
|
||||
- **Model**: `nomic-embed-text-v2-moe` (Q8_0 quantization)
|
||||
- **Vector size**: 768 dimensions
|
||||
- **Context size**: 8192 tokens
|
||||
- **GPU accelerated**: NVIDIA CUDA with flash attention
|
||||
|
||||
### Data Flow
|
||||
```
|
||||
claude-home repo files → rsync to manticore → md-kb-rag index
|
||||
→ chunks markdown → nomic-embed generates vectors → stored in Qdrant
|
||||
→ MCP search tool queries Qdrant → returns ranked chunks to Claude
|
||||
```
|
||||
|
||||
## MCP Integration
|
||||
|
||||
### Claude Code Config
|
||||
Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV2/Development/claude-home` project:
|
||||
|
||||
```json
|
||||
{
|
||||
"kb-search": {
|
||||
"type": "http",
|
||||
"url": "http://10.10.0.226:8001/mcp",
|
||||
"headers": {
|
||||
"Authorization": "Bearer <MCP_BEARER_TOKEN from .env>"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Available MCP Tools
|
||||
|
||||
#### `search`
|
||||
Semantic search across all indexed documents. Returns ranked chunks with scores.
|
||||
```
|
||||
query: "natural language search query"
|
||||
limit: 10 # max results (default 10, max 50)
|
||||
domain: null # optional filter
|
||||
type: null # optional filter
|
||||
tags: [] # optional tag filter
|
||||
```
|
||||
|
||||
#### `get_document`
|
||||
Retrieve full raw content of a document by file path (as returned by search results).
|
||||
```
|
||||
path: "/data/productivity/google-workspace-cli.md"
|
||||
```
|
||||
|
||||
## Data Sync
|
||||
|
||||
The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore. This is **not** a functional git clone — it's a directory with a broken `.git` that contains the repo files directly. Files must be synced manually.
|
||||
|
||||
### Syncing New/Updated Files
|
||||
```bash
|
||||
# Sync a single file
|
||||
rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \
|
||||
manticore:~/docker/md-kb-rag/data/repo/path/to/
|
||||
|
||||
# Sync an entire directory
|
||||
rsync -av /mnt/NV2/Development/claude-home/dirname/ \
|
||||
manticore:~/docker/md-kb-rag/data/repo/dirname/
|
||||
|
||||
# Sync everything (careful — includes tmp/, .claude/, etc.)
|
||||
rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \
|
||||
/mnt/NV2/Development/claude-home/ \
|
||||
manticore:~/docker/md-kb-rag/data/repo/
|
||||
```
|
||||
|
||||
### Re-indexing After Sync
|
||||
```bash
|
||||
# Incremental (only changed/new files — fast, use this normally)
|
||||
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
|
||||
|
||||
# Full re-index (clears state DB, re-embeds everything — slow)
|
||||
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"
|
||||
```
|
||||
|
||||
The incremental indexer compares content hashes in a SQLite state DB (`data/state/state.db`) and only re-embeds files whose content has changed.
|
||||
|
||||
## Operations
|
||||
|
||||
### Health Check
|
||||
```bash
|
||||
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"
|
||||
```
|
||||
|
||||
### Status (file list + Qdrant point count)
|
||||
```bash
|
||||
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag status"
|
||||
```
|
||||
|
||||
### Validate Markdown (without indexing)
|
||||
```bash
|
||||
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag validate"
|
||||
```
|
||||
|
||||
### View Logs
|
||||
```bash
|
||||
ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 50"
|
||||
```
|
||||
|
||||
### Restart Stack
|
||||
```bash
|
||||
ssh manticore "cd ~/docker/md-kb-rag && docker compose restart"
|
||||
```
|
||||
|
||||
## Adding New Documentation
|
||||
|
||||
The standard workflow for adding docs to the KB:
|
||||
|
||||
1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/`
|
||||
2. **Update the relevant CONTEXT.md** with a summary and link
|
||||
3. **Rsync to manticore**:
|
||||
```bash
|
||||
rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \
|
||||
manticore:~/docker/md-kb-rag/data/repo/path/to/
|
||||
rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \
|
||||
manticore:~/docker/md-kb-rag/data/repo/path/to/
|
||||
```
|
||||
4. **Re-index**:
|
||||
```bash
|
||||
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
|
||||
```
|
||||
5. **Verify** with a search query to confirm the new content is findable
|
||||
|
||||
## Environment Variables (on manticore)
|
||||
|
||||
File: `~/docker/md-kb-rag/.env`
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `MODEL_PATH` | Path to embedding model directory |
|
||||
| `MODEL_FILE` | Embedding model filename (nomic-embed-text-v2-moe.Q8_0.gguf) |
|
||||
| `KB_PATH` | Path to knowledge base repo (./data/repo) |
|
||||
| `MCP_PORT` | MCP server port (8001) |
|
||||
| `MCP_BEARER_TOKEN` | Auth token for MCP endpoint |
|
||||
| `RUST_LOG` | Log level (info) |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Search returns no results
|
||||
- Check health: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag health"`
|
||||
- Verify files are synced: `ssh manticore "ls ~/docker/md-kb-rag/data/repo/path/to/file.md"`
|
||||
- Re-index: may need `--full` if state DB is out of sync
|
||||
|
||||
### MCP connection refused
|
||||
- Check container is running: `ssh manticore "docker ps | grep kb-rag"`
|
||||
- Check port is listening: `ssh manticore "curl -s http://localhost:8001/health"`
|
||||
- Restart: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart kb-rag"`
|
||||
|
||||
### Embedding server OOM / crash
|
||||
- The llama.cpp embedding server uses GPU memory. Check with `ssh manticore "nvidia-smi"`
|
||||
- Restart embeddings container: `ssh manticore "cd ~/docker/md-kb-rag && docker compose restart embeddings"`
|
||||
|
||||
### Stale index after deleting files
|
||||
- Incremental indexing doesn't remove orphaned vectors for deleted files
|
||||
- Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"`
|
||||
Loading…
Reference in New Issue
Block a user