docs: sync KB — kb-rag-system.md
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 2s
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 2s
This commit is contained in:
parent
be896b4c2a
commit
1ca0458a66
@ -86,7 +86,7 @@ The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore as a proper gi
|
||||
1. Push `.md` files to `main` branch on Gitea
|
||||
2. Gitea Actions workflow (`.gitea/workflows/kb-reindex.yml`) fires
|
||||
3. Workflow sends HMAC-SHA256 signed POST to `http://10.10.0.226:8001/hooks/reindex`
|
||||
4. md-kb-rag receives webhook → runs `git pull --ff-only` → runs incremental reindex
|
||||
4. md-kb-rag receives webhook → runs `git fetch` + `git merge --ff-only` (using `GIT_PULL_TOKEN`) → runs incremental reindex
|
||||
5. Only changed files are re-embedded (content hash comparison via SQLite state DB)
|
||||
|
||||
### Webhook Authentication
|
||||
@ -185,6 +185,7 @@ File: `~/docker/md-kb-rag/.env`
|
||||
| `MCP_PORT` | MCP server port (8001) |
|
||||
| `MCP_BEARER_TOKEN` | Auth token for MCP endpoint |
|
||||
| `WEBHOOK_SECRET` | HMAC secret for webhook auth (shared with Gitea repo secret) |
|
||||
| `GIT_PULL_TOKEN` | Gitea token for authenticated git fetch during webhook reindex |
|
||||
| `RUST_LOG` | Log level (info) |
|
||||
|
||||
## Troubleshooting
|
||||
@ -223,3 +224,29 @@ The kb-rag service has these non-obvious requirements:
|
||||
- `user: "1000:1000"` — must match the uid/gid that owns `data/repo/` for git pull to work
|
||||
- `config.yaml` mount — provides `source.git_url` and `branch` so the webhook handler knows to run `git pull`
|
||||
- `.gitconfig` mount + `GIT_CONFIG_GLOBAL` env var — git needs `safe.directory = /data` since the volume owner differs from the container's default user
|
||||
|
||||
## Changelog
|
||||
|
||||
### 2026-03-17 — Image Update + Config Fixes
|
||||
|
||||
**Image pull**: Updated `ghcr.io/st0nefish/md-kb-rag:latest` (8 upstream commits since initial deploy on 2026-03-11).
|
||||
|
||||
**Key upstream changes applied:**
|
||||
- **`GIT_PULL_TOKEN` support** — Webhook-triggered reindex now uses explicit `git fetch` + `git merge --ff-only` with a token injected into the HTTPS URL. Previously the git pull inside Docker was silently failing (no SSH client, dubious ownership errors).
|
||||
- **Auto-clone on startup** — Setting `source.git_url` allows the container to shallow-clone the repo into an empty volume on first boot. Not adopted (we use a bind-mount), but available.
|
||||
- **`EMBEDDING_API_KEY` support** — Optional env var for authenticated embedding providers. Not needed for local llama.cpp.
|
||||
- **Custom MCP instructions** — New `mcp.instructions` config field sets the server-level instructions block sent to MCP clients. Server auto-appends discovered filter metadata (domains, types, tags).
|
||||
- **Bug fixes** — Webhook rate limiter gap, globset deny-all fallback, RwLock panic in MCP startup, HTTP 429/503 retry logic for embedding API.
|
||||
|
||||
**Config changes made:**
|
||||
- Added `GIT_PULL_TOKEN` env var to `.env` (Gitea token with repo read access)
|
||||
- Added `GIT_PULL_TOKEN=${GIT_PULL_TOKEN:-}` to `docker-compose.yml` environment section
|
||||
- Added `mcp.instructions` to `config.yaml` with proactive search trigger keywords matching the claude-home topic areas
|
||||
|
||||
**Env vars table update:**
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `GIT_PULL_TOKEN` | Gitea token for authenticated git fetch during webhook reindex |
|
||||
|
||||
**Result**: Webhook reindex pipeline now works end-to-end (push → Gitea Action → webhook → git fetch with auth → incremental reindex). Verified with live push test.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user