docs: update kb-rag with auto-sync pipeline, add Claude Code config guide
All checks were successful
Reindex Knowledge Base / reindex (push) Successful in 3s

- kb-rag-system.md: replace manual rsync workflow with automated
  Gitea Actions → webhook → git pull → reindex pipeline docs
- claude-code-config.md: new guide covering config file locations,
  MCP server setup, hooks, and permissions
- workstation/CONTEXT.md: add Claude Code section

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Cal Corum 2026-03-12 08:36:55 -05:00
parent 63e0184a6d
commit c5dc5d96a6
3 changed files with 186 additions and 36 deletions

View File

@ -7,7 +7,8 @@ Semantic search over the entire `claude-home` documentation repo using vector em
- **Host**: `manticore` (10.10.0.226)
- **Stack location**: `~/docker/md-kb-rag/` on manticore
- **MCP endpoint**: `http://10.10.0.226:8001/mcp`
- **Current state**: 132 files indexed, ~1186 vector points
- **Webhook endpoint**: `http://10.10.0.226:8001/hooks/reindex`
- **Auto-sync**: Gitea Actions workflow triggers on push to `main` (`.md` files only)
## Architecture
@ -27,20 +28,20 @@ Three containers in a single Docker Compose stack:
### Data Flow
```
claude-home repo files → rsync to manticore → md-kb-rag index
chunks markdown → nomic-embed generates vectors → stored in Qdrant
→ MCP search tool queries Qdrant → returns ranked chunks to Claude
push .md to main → Gitea Action → POST /hooks/reindex (HMAC-signed)
kb-rag: git pull → incremental index → nomic-embed generates vectors
stored in Qdrant → MCP search tool queries Qdrant → returns ranked chunks to Claude
```
## MCP Integration
### Claude Code Config
Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV2/Development/claude-home` project:
Registered as a user-level MCP server in `~/.claude.json` under the top-level `mcpServers` key:
```json
{
"kb-search": {
"type": "http",
"type": "url",
"url": "http://10.10.0.226:8001/mcp",
"headers": {
"Authorization": "Bearer <MCP_BEARER_TOKEN from .env>"
@ -49,6 +50,8 @@ Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV
}
```
See [workstation/claude-code-config.md](../workstation/claude-code-config.md) for details on MCP server configuration.
### Available MCP Tools
#### `search`
@ -67,27 +70,48 @@ Retrieve full raw content of a document by file path (as returned by search resu
path: "/data/productivity/google-workspace-cli.md"
```
## Data Sync
## Auto-Sync Pipeline
The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore. This is **not** a functional git clone — it's a directory with a broken `.git` that contains the repo files directly. Files must be synced manually.
The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore as a proper git clone of `http://10.10.0.225:3000/cal/claude-home.git`. Syncing is fully automated.
### Syncing New/Updated Files
```bash
# Sync a single file
rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \
manticore:~/docker/md-kb-rag/data/repo/path/to/
### How It Works
1. Push `.md` files to `main` branch on Gitea
2. Gitea Actions workflow (`.gitea/workflows/kb-reindex.yml`) fires
3. Workflow sends HMAC-SHA256 signed POST to `http://10.10.0.226:8001/hooks/reindex`
4. md-kb-rag receives webhook → runs `git pull --ff-only` → runs incremental reindex
5. Only changed files are re-embedded (content hash comparison via SQLite state DB)
# Sync an entire directory
rsync -av /mnt/NV2/Development/claude-home/dirname/ \
manticore:~/docker/md-kb-rag/data/repo/dirname/
### Webhook Authentication
- Provider: Gitea (native format)
- Header: `x-gitea-signature` containing hex-encoded HMAC-SHA256 of the request body
- Secret: stored as `WEBHOOK_SECRET` in `.env` on manticore and as `KB_WEBHOOK_SECRET` Gitea repo secret
- Body must include `{"ref": "refs/heads/main"}` to match the configured branch
# Sync everything (careful — includes tmp/, .claude/, etc.)
rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \
/mnt/NV2/Development/claude-home/ \
manticore:~/docker/md-kb-rag/data/repo/
### Gitea Actions Workflow
```yaml
# .gitea/workflows/kb-reindex.yml
name: Reindex Knowledge Base
on:
push:
branches: [main]
paths: ['**/*.md']
jobs:
reindex:
runs-on: ubuntu-latest
steps:
- name: Trigger KB re-index
env:
WEBHOOK_SECRET: ${{ secrets.KB_WEBHOOK_SECRET }}
run: |
BODY='{"ref":"refs/heads/main"}'
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" | awk '{print $2}')
curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \
-H "Content-Type: application/json" \
-H "x-gitea-signature: $SIG" \
-d "$BODY"
```
### Re-indexing After Sync
### Manual Re-indexing
```bash
# Incremental (only changed/new files — fast, use this normally)
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
@ -96,7 +120,16 @@ ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"
```
The incremental indexer compares content hashes in a SQLite state DB (`data/state/state.db`) and only re-embeds files whose content has changed.
### Manual Webhook Test
```bash
BODY='{"ref":"refs/heads/main"}'
SECRET='<webhook-secret>'
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}')
curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \
-H "Content-Type: application/json" \
-H "x-gitea-signature: $SIG" \
-d "$BODY"
```
## Operations
@ -127,22 +160,10 @@ ssh manticore "cd ~/docker/md-kb-rag && docker compose restart"
## Adding New Documentation
The standard workflow for adding docs to the KB:
1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/`
2. **Update the relevant CONTEXT.md** with a summary and link
3. **Rsync to manticore**:
```bash
rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \
manticore:~/docker/md-kb-rag/data/repo/path/to/
rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \
manticore:~/docker/md-kb-rag/data/repo/path/to/
```
4. **Re-index**:
```bash
ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index"
```
5. **Verify** with a search query to confirm the new content is findable
3. **Commit and push** to `main` — the pipeline handles the rest
4. **Verify** with a `kb-search` MCP search query to confirm the new content is findable
## Environment Variables (on manticore)
@ -155,6 +176,7 @@ File: `~/docker/md-kb-rag/.env`
| `KB_PATH` | Path to knowledge base repo (./data/repo) |
| `MCP_PORT` | MCP server port (8001) |
| `MCP_BEARER_TOKEN` | Auth token for MCP endpoint |
| `WEBHOOK_SECRET` | HMAC secret for webhook auth (shared with Gitea repo secret) |
| `RUST_LOG` | Log level (info) |
## Troubleshooting
@ -176,3 +198,20 @@ File: `~/docker/md-kb-rag/.env`
### Stale index after deleting files
- Incremental indexing doesn't remove orphaned vectors for deleted files
- Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"`
### Webhook returns 500 "Git pull failed"
- Check container logs: `ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 20"`
- **"dubious ownership"**: The `.gitconfig` with `safe.directory = /data` isn't mounted or `GIT_CONFIG_GLOBAL` env var is missing
- **"Permission denied"**: Container must run as `user: "1000:1000"` to match repo file ownership
- **"FETCH_HEAD" error**: Same as permission denied — uid mismatch
### MCP session disconnects after container restart
- Run `/mcp` in Claude Code to reconnect
- This happens because the Streamable HTTP session is invalidated when the container restarts
## Docker Compose Notes
The kb-rag service has these non-obvious requirements:
- `user: "1000:1000"` — must match the uid/gid that owns `data/repo/` for git pull to work
- `config.yaml` mount — provides `source.git_url` and `branch` so the webhook handler knows to run `git pull`
- `.gitconfig` mount + `GIT_CONFIG_GLOBAL` env var — git needs `safe.directory = /data` since the volume owner differs from the container's default user

View File

@ -98,6 +98,16 @@ Config: `~/.config/claude-scheduled/` | Skill: `~/.claude/skills/create-schedule
Uses `claude-scheduled@.service` template unit. Add new tasks by creating a directory under `~/.config/claude-scheduled/tasks/` and a corresponding timer. See the skill for full instructions.
## Claude Code
See [claude-code-config.md](claude-code-config.md) for full details on config file locations, MCP server setup, hooks, and permissions.
Key files:
- `~/.claude.json` — MCP servers (cognitive-memory, n8n-mcp, gitea-mcp, tui-driver, kb-search)
- `~/.claude/settings.json` — permissions, hooks, env vars, plugins
- `~/.claude/skills/` — custom skill definitions
- `~/.claude/hooks/` — hook scripts (format-code.sh, notify-subagent-done.sh)
## Backups
Original files are backed up by `install.sh` to `~/.dotfiles-backup/<timestamp>/` before being replaced with symlinks. Multiple runs create separate timestamped backup dirs. Old backups can be cleaned up manually.

View File

@ -0,0 +1,101 @@
# Claude Code Configuration
## Config File Locations
Claude Code reads settings from multiple files in a specific precedence order:
| File | Scope | What goes here |
|------|-------|---------------|
| `~/.claude.json` | User-level (all projects) | MCP servers, startup state, git repo mappings |
| `~/.claude/settings.json` | User-level (all projects) | Permissions, hooks, env vars, plugins, status line |
| `~/.claude/projects/<sanitized-cwd>/settings.json` | Project-level | Project-specific permission overrides |
| `<repo>/.claude/settings.json` | Repo-level (checked in) | Shared team settings |
| `<repo>/.mcp.json` | Repo-level (checked in) | Project-scoped MCP servers |
**Important**: `~/.claude.json` (home directory root) is different from `~/.claude/settings.json` (inside the `.claude` directory). They serve different purposes.
## MCP Server Configuration
MCP servers are defined in the top-level `mcpServers` key of `~/.claude.json`:
```json
{
"mcpServers": {
"server-name": {
"command": "/path/to/binary",
"args": ["-t", "stdio"],
"env": {
"API_KEY": "value"
}
}
}
}
```
### Server Types
**stdio** — local process, communicates over stdin/stdout:
```json
{
"command": "/path/to/binary",
"args": ["--flag", "value"],
"env": { "KEY": "value" }
}
```
**url** — remote HTTP server (Streamable HTTP transport):
```json
{
"type": "url",
"url": "http://host:port/mcp",
"headers": {
"Authorization": "Bearer <token>"
}
}
```
### Current MCP Servers
All defined in `~/.claude.json` under `mcpServers`:
| Server | Type | Purpose |
|--------|------|---------|
| `cognitive-memory` | stdio | Persistent memory system (local Python) |
| `n8n-mcp` | stdio | n8n workflow automation API |
| `gitea-mcp` | stdio | Gitea API (issues, PRs, repos) |
| `tui-driver` | stdio | TUI automation for testing |
| `kb-search` | url | Knowledge base semantic search (manticore) |
### Managing MCP Servers
- **Add interactively**: `/mcp add <name>` in Claude Code (stores in `~/.claude.json`)
- **Add manually**: Edit the `mcpServers` object in `~/.claude.json`
- **Reconnect**: `/mcp` → select server → Reconnect (useful after remote server restarts)
- **Permissions**: Auto-allow MCP tools via `"mcp__server-name__*"` in `~/.claude/settings.json` permissions.allow
### Gotchas
- **Session drops**: Remote (url) MCP servers lose their session if the server container restarts. Run `/mcp` to reconnect.
- **Don't confuse config files**: `~/.claude.json` holds MCP servers. `~/.claude/settings.json` holds permissions/hooks. They are NOT the same file.
- **Stale configs**: `~/.claude/.mcp.json` and `~/.claude/.mcp-full.json` are NOT read by Claude Code despite looking like they should be. The canonical location is `~/.claude.json`.
## Hooks
Hooks are configured in `~/.claude/settings.json` under the `hooks` key. They run shell commands or HTTP requests in response to events.
### Current Hooks
| Event | Action |
|-------|--------|
| `PostToolUse` (Edit/Write/MultiEdit) | Auto-format code via `format-code.sh` |
| `SubagentStop` | Notify via `notify-subagent-done.sh` |
| `SessionEnd` | Save session memories via `cognitive-memory` |
## Permissions
Permission rules live in `~/.claude/settings.json` under `permissions.allow` and `permissions.deny`. Format: `ToolName(scope:pattern)`.
Common patterns:
- `"mcp__gitea-mcp__*"` — allow all gitea MCP tools
- `"WebFetch(domain:docs.example.com)"` — allow fetching from specific domain
- `"Bash(ssh:*)"` — allow SSH commands