diff --git a/development/kb-rag-system.md b/development/kb-rag-system.md index f15e23b..50218fa 100644 --- a/development/kb-rag-system.md +++ b/development/kb-rag-system.md @@ -7,7 +7,8 @@ Semantic search over the entire `claude-home` documentation repo using vector em - **Host**: `manticore` (10.10.0.226) - **Stack location**: `~/docker/md-kb-rag/` on manticore - **MCP endpoint**: `http://10.10.0.226:8001/mcp` -- **Current state**: 132 files indexed, ~1186 vector points +- **Webhook endpoint**: `http://10.10.0.226:8001/hooks/reindex` +- **Auto-sync**: Gitea Actions workflow triggers on push to `main` (`.md` files only) ## Architecture @@ -27,20 +28,20 @@ Three containers in a single Docker Compose stack: ### Data Flow ``` -claude-home repo files → rsync to manticore → md-kb-rag index - → chunks markdown → nomic-embed generates vectors → stored in Qdrant - → MCP search tool queries Qdrant → returns ranked chunks to Claude +push .md to main → Gitea Action → POST /hooks/reindex (HMAC-signed) + → kb-rag: git pull → incremental index → nomic-embed generates vectors + → stored in Qdrant → MCP search tool queries Qdrant → returns ranked chunks to Claude ``` ## MCP Integration ### Claude Code Config -Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV2/Development/claude-home` project: +Registered as a user-level MCP server in `~/.claude.json` under the top-level `mcpServers` key: ```json { "kb-search": { - "type": "http", + "type": "url", "url": "http://10.10.0.226:8001/mcp", "headers": { "Authorization": "Bearer " @@ -49,6 +50,8 @@ Registered as a project-scoped MCP server in `~/.claude.json` under the `/mnt/NV } ``` +See [workstation/claude-code-config.md](../workstation/claude-code-config.md) for details on MCP server configuration. + ### Available MCP Tools #### `search` @@ -67,27 +70,48 @@ Retrieve full raw content of a document by file path (as returned by search resu path: "/data/productivity/google-workspace-cli.md" ``` -## Data Sync +## Auto-Sync Pipeline -The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore. This is **not** a functional git clone — it's a directory with a broken `.git` that contains the repo files directly. Files must be synced manually. +The KB data lives at `~/docker/md-kb-rag/data/repo/` on manticore as a proper git clone of `http://10.10.0.225:3000/cal/claude-home.git`. Syncing is fully automated. -### Syncing New/Updated Files -```bash -# Sync a single file -rsync -av /mnt/NV2/Development/claude-home/path/to/file.md \ - manticore:~/docker/md-kb-rag/data/repo/path/to/ +### How It Works +1. Push `.md` files to `main` branch on Gitea +2. Gitea Actions workflow (`.gitea/workflows/kb-reindex.yml`) fires +3. Workflow sends HMAC-SHA256 signed POST to `http://10.10.0.226:8001/hooks/reindex` +4. md-kb-rag receives webhook → runs `git pull --ff-only` → runs incremental reindex +5. Only changed files are re-embedded (content hash comparison via SQLite state DB) -# Sync an entire directory -rsync -av /mnt/NV2/Development/claude-home/dirname/ \ - manticore:~/docker/md-kb-rag/data/repo/dirname/ +### Webhook Authentication +- Provider: Gitea (native format) +- Header: `x-gitea-signature` containing hex-encoded HMAC-SHA256 of the request body +- Secret: stored as `WEBHOOK_SECRET` in `.env` on manticore and as `KB_WEBHOOK_SECRET` Gitea repo secret +- Body must include `{"ref": "refs/heads/main"}` to match the configured branch -# Sync everything (careful — includes tmp/, .claude/, etc.) -rsync -av --exclude='.git' --exclude='.claude' --exclude='tmp' \ - /mnt/NV2/Development/claude-home/ \ - manticore:~/docker/md-kb-rag/data/repo/ +### Gitea Actions Workflow +```yaml +# .gitea/workflows/kb-reindex.yml +name: Reindex Knowledge Base +on: + push: + branches: [main] + paths: ['**/*.md'] +jobs: + reindex: + runs-on: ubuntu-latest + steps: + - name: Trigger KB re-index + env: + WEBHOOK_SECRET: ${{ secrets.KB_WEBHOOK_SECRET }} + run: | + BODY='{"ref":"refs/heads/main"}' + SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$WEBHOOK_SECRET" | awk '{print $2}') + curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \ + -H "Content-Type: application/json" \ + -H "x-gitea-signature: $SIG" \ + -d "$BODY" ``` -### Re-indexing After Sync +### Manual Re-indexing ```bash # Incremental (only changed/new files — fast, use this normally) ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" @@ -96,7 +120,16 @@ ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full" ``` -The incremental indexer compares content hashes in a SQLite state DB (`data/state/state.db`) and only re-embeds files whose content has changed. +### Manual Webhook Test +```bash +BODY='{"ref":"refs/heads/main"}' +SECRET='' +SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}') +curl -sf -X POST http://10.10.0.226:8001/hooks/reindex \ + -H "Content-Type: application/json" \ + -H "x-gitea-signature: $SIG" \ + -d "$BODY" +``` ## Operations @@ -127,22 +160,10 @@ ssh manticore "cd ~/docker/md-kb-rag && docker compose restart" ## Adding New Documentation -The standard workflow for adding docs to the KB: - 1. **Create/edit the markdown file** in `/mnt/NV2/Development/claude-home/` 2. **Update the relevant CONTEXT.md** with a summary and link -3. **Rsync to manticore**: - ```bash - rsync -av /mnt/NV2/Development/claude-home/path/to/newfile.md \ - manticore:~/docker/md-kb-rag/data/repo/path/to/ - rsync -av /mnt/NV2/Development/claude-home/path/to/CONTEXT.md \ - manticore:~/docker/md-kb-rag/data/repo/path/to/ - ``` -4. **Re-index**: - ```bash - ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index" - ``` -5. **Verify** with a search query to confirm the new content is findable +3. **Commit and push** to `main` — the pipeline handles the rest +4. **Verify** with a `kb-search` MCP search query to confirm the new content is findable ## Environment Variables (on manticore) @@ -155,6 +176,7 @@ File: `~/docker/md-kb-rag/.env` | `KB_PATH` | Path to knowledge base repo (./data/repo) | | `MCP_PORT` | MCP server port (8001) | | `MCP_BEARER_TOKEN` | Auth token for MCP endpoint | +| `WEBHOOK_SECRET` | HMAC secret for webhook auth (shared with Gitea repo secret) | | `RUST_LOG` | Log level (info) | ## Troubleshooting @@ -176,3 +198,20 @@ File: `~/docker/md-kb-rag/.env` ### Stale index after deleting files - Incremental indexing doesn't remove orphaned vectors for deleted files - Run `--full` re-index to clean up: `ssh manticore "docker exec md-kb-rag-kb-rag-1 md-kb-rag index --full"` + +### Webhook returns 500 "Git pull failed" +- Check container logs: `ssh manticore "docker logs md-kb-rag-kb-rag-1 --tail 20"` +- **"dubious ownership"**: The `.gitconfig` with `safe.directory = /data` isn't mounted or `GIT_CONFIG_GLOBAL` env var is missing +- **"Permission denied"**: Container must run as `user: "1000:1000"` to match repo file ownership +- **"FETCH_HEAD" error**: Same as permission denied — uid mismatch + +### MCP session disconnects after container restart +- Run `/mcp` in Claude Code to reconnect +- This happens because the Streamable HTTP session is invalidated when the container restarts + +## Docker Compose Notes + +The kb-rag service has these non-obvious requirements: +- `user: "1000:1000"` — must match the uid/gid that owns `data/repo/` for git pull to work +- `config.yaml` mount — provides `source.git_url` and `branch` so the webhook handler knows to run `git pull` +- `.gitconfig` mount + `GIT_CONFIG_GLOBAL` env var — git needs `safe.directory = /data` since the volume owner differs from the container's default user diff --git a/workstation/CONTEXT.md b/workstation/CONTEXT.md index d7bfd57..5410ab5 100644 --- a/workstation/CONTEXT.md +++ b/workstation/CONTEXT.md @@ -98,6 +98,16 @@ Config: `~/.config/claude-scheduled/` | Skill: `~/.claude/skills/create-schedule Uses `claude-scheduled@.service` template unit. Add new tasks by creating a directory under `~/.config/claude-scheduled/tasks/` and a corresponding timer. See the skill for full instructions. +## Claude Code + +See [claude-code-config.md](claude-code-config.md) for full details on config file locations, MCP server setup, hooks, and permissions. + +Key files: +- `~/.claude.json` — MCP servers (cognitive-memory, n8n-mcp, gitea-mcp, tui-driver, kb-search) +- `~/.claude/settings.json` — permissions, hooks, env vars, plugins +- `~/.claude/skills/` — custom skill definitions +- `~/.claude/hooks/` — hook scripts (format-code.sh, notify-subagent-done.sh) + ## Backups Original files are backed up by `install.sh` to `~/.dotfiles-backup//` before being replaced with symlinks. Multiple runs create separate timestamped backup dirs. Old backups can be cleaned up manually. diff --git a/workstation/claude-code-config.md b/workstation/claude-code-config.md new file mode 100644 index 0000000..9a95fa6 --- /dev/null +++ b/workstation/claude-code-config.md @@ -0,0 +1,101 @@ +# Claude Code Configuration + +## Config File Locations + +Claude Code reads settings from multiple files in a specific precedence order: + +| File | Scope | What goes here | +|------|-------|---------------| +| `~/.claude.json` | User-level (all projects) | MCP servers, startup state, git repo mappings | +| `~/.claude/settings.json` | User-level (all projects) | Permissions, hooks, env vars, plugins, status line | +| `~/.claude/projects//settings.json` | Project-level | Project-specific permission overrides | +| `/.claude/settings.json` | Repo-level (checked in) | Shared team settings | +| `/.mcp.json` | Repo-level (checked in) | Project-scoped MCP servers | + +**Important**: `~/.claude.json` (home directory root) is different from `~/.claude/settings.json` (inside the `.claude` directory). They serve different purposes. + +## MCP Server Configuration + +MCP servers are defined in the top-level `mcpServers` key of `~/.claude.json`: + +```json +{ + "mcpServers": { + "server-name": { + "command": "/path/to/binary", + "args": ["-t", "stdio"], + "env": { + "API_KEY": "value" + } + } + } +} +``` + +### Server Types + +**stdio** — local process, communicates over stdin/stdout: +```json +{ + "command": "/path/to/binary", + "args": ["--flag", "value"], + "env": { "KEY": "value" } +} +``` + +**url** — remote HTTP server (Streamable HTTP transport): +```json +{ + "type": "url", + "url": "http://host:port/mcp", + "headers": { + "Authorization": "Bearer " + } +} +``` + +### Current MCP Servers + +All defined in `~/.claude.json` under `mcpServers`: + +| Server | Type | Purpose | +|--------|------|---------| +| `cognitive-memory` | stdio | Persistent memory system (local Python) | +| `n8n-mcp` | stdio | n8n workflow automation API | +| `gitea-mcp` | stdio | Gitea API (issues, PRs, repos) | +| `tui-driver` | stdio | TUI automation for testing | +| `kb-search` | url | Knowledge base semantic search (manticore) | + +### Managing MCP Servers + +- **Add interactively**: `/mcp add ` in Claude Code (stores in `~/.claude.json`) +- **Add manually**: Edit the `mcpServers` object in `~/.claude.json` +- **Reconnect**: `/mcp` → select server → Reconnect (useful after remote server restarts) +- **Permissions**: Auto-allow MCP tools via `"mcp__server-name__*"` in `~/.claude/settings.json` permissions.allow + +### Gotchas + +- **Session drops**: Remote (url) MCP servers lose their session if the server container restarts. Run `/mcp` to reconnect. +- **Don't confuse config files**: `~/.claude.json` holds MCP servers. `~/.claude/settings.json` holds permissions/hooks. They are NOT the same file. +- **Stale configs**: `~/.claude/.mcp.json` and `~/.claude/.mcp-full.json` are NOT read by Claude Code despite looking like they should be. The canonical location is `~/.claude.json`. + +## Hooks + +Hooks are configured in `~/.claude/settings.json` under the `hooks` key. They run shell commands or HTTP requests in response to events. + +### Current Hooks + +| Event | Action | +|-------|--------| +| `PostToolUse` (Edit/Write/MultiEdit) | Auto-format code via `format-code.sh` | +| `SubagentStop` | Notify via `notify-subagent-done.sh` | +| `SessionEnd` | Save session memories via `cognitive-memory` | + +## Permissions + +Permission rules live in `~/.claude/settings.json` under `permissions.allow` and `permissions.deny`. Format: `ToolName(scope:pattern)`. + +Common patterns: +- `"mcp__gitea-mcp__*"` — allow all gitea MCP tools +- `"WebFetch(domain:docs.example.com)"` — allow fetching from specific domain +- `"Bash(ssh:*)"` — allow SSH commands