From 1ca0458a66e703d12b3f318bd6cf72670df5358d Mon Sep 17 00:00:00 2001 From: Cal Corum Date: Tue, 17 Mar 2026 22:44:09 -0500 Subject: [PATCH] =?UTF-8?q?docs:=20sync=20KB=20=E2=80=94=20kb-rag-system.m?= =?UTF-8?q?d?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- development/kb-rag-system.md | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/development/kb-rag-system.md b/development/kb-rag-system.md index 6f6827a..06a5b51 100644 --- a/development/kb-rag-system.md +++ b/development/kb-rag-system.md @@ -86,7 +86,7 @@ The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore as a proper gi 1. Push `.md` files to `main` branch on Gitea 2. Gitea Actions workflow (`.gitea/workflows/kb-reindex.yml`) fires 3. Workflow sends HMAC-SHA256 signed POST to `http://10.10.0.226:8001/hooks/reindex` -4. md-kb-rag receives webhook → runs `git pull --ff-only` → runs incremental reindex +4. md-kb-rag receives webhook → runs `git fetch` + `git merge --ff-only` (using `GIT_PULL_TOKEN`) → runs incremental reindex 5. Only changed files are re-embedded (content hash comparison via SQLite state DB) ### Webhook Authentication @@ -185,6 +185,7 @@ File: `~/docker/md-kb-rag/.env` | `MCP_PORT` | MCP server port (8001) | | `MCP_BEARER_TOKEN` | Auth token for MCP endpoint | | `WEBHOOK_SECRET` | HMAC secret for webhook auth (shared with Gitea repo secret) | +| `GIT_PULL_TOKEN` | Gitea token for authenticated git fetch during webhook reindex | | `RUST_LOG` | Log level (info) | ## Troubleshooting @@ -223,3 +224,29 @@ The kb-rag service has these non-obvious requirements: - `user: "1000:1000"` — must match the uid/gid that owns `data/repo/` for git pull to work - `config.yaml` mount — provides `source.git_url` and `branch` so the webhook handler knows to run `git pull` - `.gitconfig` mount + `GIT_CONFIG_GLOBAL` env var — git needs `safe.directory = /data` since the volume owner differs from the container's default user + +## Changelog + +### 2026-03-17 — Image Update + Config Fixes + +**Image pull**: Updated `ghcr.io/st0nefish/md-kb-rag:latest` (8 upstream commits since initial deploy on 2026-03-11). + +**Key upstream changes applied:** +- **`GIT_PULL_TOKEN` support** — Webhook-triggered reindex now uses explicit `git fetch` + `git merge --ff-only` with a token injected into the HTTPS URL. Previously the git pull inside Docker was silently failing (no SSH client, dubious ownership errors). +- **Auto-clone on startup** — Setting `source.git_url` allows the container to shallow-clone the repo into an empty volume on first boot. Not adopted (we use a bind-mount), but available. +- **`EMBEDDING_API_KEY` support** — Optional env var for authenticated embedding providers. Not needed for local llama.cpp. +- **Custom MCP instructions** — New `mcp.instructions` config field sets the server-level instructions block sent to MCP clients. Server auto-appends discovered filter metadata (domains, types, tags). +- **Bug fixes** — Webhook rate limiter gap, globset deny-all fallback, RwLock panic in MCP startup, HTTP 429/503 retry logic for embedding API. + +**Config changes made:** +- Added `GIT_PULL_TOKEN` env var to `.env` (Gitea token with repo read access) +- Added `GIT_PULL_TOKEN=${GIT_PULL_TOKEN:-}` to `docker-compose.yml` environment section +- Added `mcp.instructions` to `config.yaml` with proactive search trigger keywords matching the claude-home topic areas + +**Env vars table update:** + +| Variable | Purpose | +|----------|---------| +| `GIT_PULL_TOKEN` | Gitea token for authenticated git fetch during webhook reindex | + +**Result**: Webhook reindex pipeline now works end-to-end (push → Gitea Action → webhook → git fetch with auth → incremental reindex). Verified with live push test.