All checks were successful
Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 2s
- runner.sh: opt-in session persistence via session_resumable and resume_last_session settings; fix read_setting to normalize booleans - issue-poller.sh: capture and log session_id from worker invocations, include in result JSON - pr-reviewer-dispatcher.sh: capture and log session_id from reviews - n8n workflow: add --append-system-prompt to initial SSH node, add Follow Up Diagnostics node using --resume for deeper investigation, update Discord Alert with remediation details - Add Agent SDK evaluation doc (CLI vs Python/TS SDK comparison) - Update CONTEXT.md with session resumption documentation Closes #3 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
176 lines
7.7 KiB
Markdown
176 lines
7.7 KiB
Markdown
---
|
|
title: "Agent SDK Evaluation — CLI vs Python/TypeScript SDK"
|
|
description: "Comparison of Claude Code CLI invocation (claude -p) vs the native Agent SDK for programmatic use in the headless-claude and claude-scheduled systems."
|
|
type: context
|
|
domain: scheduled-tasks
|
|
tags: [claude-code, sdk, agent-sdk, python, typescript, headless, automation, evaluation]
|
|
---
|
|
|
|
# Agent SDK Evaluation: CLI vs Python/TypeScript SDK
|
|
|
|
**Date:** 2026-04-03
|
|
**Status:** Evaluation complete — recommendation below
|
|
**Related:** Issue #3 (headless-claude: Additional Agent SDK improvements)
|
|
|
|
## 1. Current Approach — CLI via `claude -p`
|
|
|
|
All headless Claude invocations use the CLI subprocess pattern:
|
|
|
|
```bash
|
|
claude -p "<prompt>" \
|
|
--model sonnet \
|
|
--output-format json \
|
|
--allowedTools "Read,Grep,Glob" \
|
|
--append-system-prompt "..." \
|
|
--max-budget-usd 2.00
|
|
```
|
|
|
|
**Pros:**
|
|
- Simple to invoke from any language (bash, n8n SSH nodes, systemd units)
|
|
- Uses Claude Max OAuth — no API key needed, no per-token billing
|
|
- Mature and battle-tested in our scheduled-tasks framework
|
|
- CLAUDE.md and settings.json are loaded automatically
|
|
- No runtime dependencies beyond the CLI binary
|
|
|
|
**Cons:**
|
|
- Structured output requires parsing JSON from stdout
|
|
- Error handling is exit-code-based with stderr parsing
|
|
- No mid-stream observability (streaming requires JSONL parsing)
|
|
- Tool approval is allowlist-only — no dynamic per-call decisions
|
|
- Session resumption requires manual `--resume` flag plumbing
|
|
|
|
## 2. Python Agent SDK
|
|
|
|
**Package:** `claude-agent-sdk` (renamed from `claude-code`)
|
|
**Install:** `pip install claude-agent-sdk`
|
|
**Requires:** Python 3.10+, `ANTHROPIC_API_KEY` env var
|
|
|
|
```python
|
|
from claude_agent_sdk import query, ClaudeAgentOptions
|
|
|
|
async for message in query(
|
|
prompt="Diagnose server health",
|
|
options=ClaudeAgentOptions(
|
|
allowed_tools=["Read", "Grep", "Bash(python3 *)"],
|
|
output_format={"type": "json_schema", "schema": {...}},
|
|
max_budget_usd=2.00,
|
|
),
|
|
):
|
|
if hasattr(message, "result"):
|
|
print(message.result)
|
|
```
|
|
|
|
**Key features:**
|
|
- Async generator with typed `SDKMessage` objects (User, Assistant, Result, System)
|
|
- `ClaudeSDKClient` for stateful multi-turn conversations
|
|
- `can_use_tool` callback for dynamic per-call tool approval
|
|
- In-process hooks (`PreToolUse`, `PostToolUse`, `Stop`, etc.)
|
|
- `rewindFiles()` to restore filesystem to any prior message point
|
|
- Typed exception hierarchy (`CLINotFoundError`, `ProcessError`, etc.)
|
|
|
|
**Limitation:** Shells out to the Claude Code CLI binary — it is NOT a pure HTTP client. The binary must be installed.
|
|
|
|
## 3. TypeScript Agent SDK
|
|
|
|
**Package:** `@anthropic-ai/claude-agent-sdk` (renamed from `@anthropic-ai/claude-code`)
|
|
**Install:** `npm install @anthropic-ai/claude-agent-sdk`
|
|
**Requires:** Node 18+, `ANTHROPIC_API_KEY` env var
|
|
|
|
```typescript
|
|
import { query } from "@anthropic-ai/claude-agent-sdk";
|
|
|
|
for await (const message of query({
|
|
prompt: "Diagnose server health",
|
|
options: {
|
|
allowedTools: ["Read", "Grep", "Bash(python3 *)"],
|
|
maxBudgetUsd: 2.00,
|
|
}
|
|
})) {
|
|
if ("result" in message) console.log(message.result);
|
|
}
|
|
```
|
|
|
|
**Key features (superset of Python):**
|
|
- Same async generator pattern
|
|
- `"auto"` permission mode (model classifier per tool call) — TS-only
|
|
- `spawnClaudeCodeProcess` hook for remote/containerized execution
|
|
- `setMcpServers()` for dynamic MCP server swapping mid-session
|
|
- V2 preview: `send()` / `stream()` patterns for simpler multi-turn
|
|
- Bundles the Claude Code binary — no separate install needed
|
|
|
|
## 4. Comparison Matrix
|
|
|
|
| Capability | `claude -p` CLI | Python SDK | TypeScript SDK |
|
|
|---|---|---|---|
|
|
| **Auth** | OAuth (Claude Max) | API key only | API key only |
|
|
| **Invocation** | Shell subprocess | Async generator | Async generator |
|
|
| **Structured output** | `--json-schema` flag | Schema in options | Schema in options |
|
|
| **Streaming** | JSONL parsing | Typed messages | Typed messages |
|
|
| **Tool approval** | `--allowedTools` only | `can_use_tool` callback | `canUseTool` callback + auto mode |
|
|
| **Session resume** | `--resume` flag | `resume: sessionId` | `resume: sessionId` |
|
|
| **Cost tracking** | Parse result JSON | `ResultMessage.total_cost_usd` | Same + per-model breakdown |
|
|
| **Error handling** | Exit codes + stderr | Typed exceptions | Typed exceptions |
|
|
| **Hooks** | External shell scripts | In-process callbacks | In-process callbacks |
|
|
| **Custom tools** | Not available | `tool()` decorator | `tool()` + Zod schemas |
|
|
| **Subagents** | Not programmatic | `agents` option | `agents` option |
|
|
| **File rewind** | Not available | `rewindFiles()` | `rewindFiles()` |
|
|
| **MCP servers** | `--mcp-config` file | Inline config object | Inline + dynamic swap |
|
|
| **CLAUDE.md loading** | Automatic | Must opt-in (`settingSources`) | Must opt-in |
|
|
| **Dependencies** | CLI binary | CLI binary + Python | Node 18+ (bundles CLI) |
|
|
|
|
## 5. Integration Paths
|
|
|
|
### A. n8n Code Nodes
|
|
|
|
The n8n Code node supports JavaScript (not TypeScript directly, but the SDK's JS output works). This would replace the current SSH → CLI pattern:
|
|
|
|
```
|
|
Schedule Trigger → Code Node (JS, uses SDK) → IF → Discord
|
|
```
|
|
|
|
**Trade-off:** Eliminates the SSH hop to CT 300, but requires `ANTHROPIC_API_KEY` and n8n to have the npm package installed. Current n8n runs in a Docker container on CT 210 — would need the SDK and CLI binary in the image.
|
|
|
|
### B. Standalone Python Scripts
|
|
|
|
Replace `claude -p` subprocess calls in custom dispatchers with the Python SDK:
|
|
|
|
```python
|
|
# Instead of: subprocess.run(["claude", "-p", prompt, ...])
|
|
async for msg in query(prompt=prompt, options=opts):
|
|
...
|
|
```
|
|
|
|
**Trade-off:** Richer error handling and streaming, but our dispatchers are bash scripts, not Python. Would require rewriting `runner.sh` and dispatchers in Python.
|
|
|
|
### C. Systemd-triggered Tasks (Current Architecture)
|
|
|
|
Keep systemd timers → bash scripts, but optionally invoke a thin Python wrapper that uses the SDK instead of `claude -p` directly.
|
|
|
|
**Trade-off:** Adds Python as a dependency for scheduled tasks that currently only need bash + the CLI binary. Marginal benefit unless we need hooks or dynamic tool approval.
|
|
|
|
## 6. Recommendation
|
|
|
|
**Stay with CLI invocation for now. Revisit the Python SDK when we need dynamic tool approval or in-process hooks.**
|
|
|
|
### Rationale
|
|
|
|
1. **Auth is the blocker.** The SDK requires `ANTHROPIC_API_KEY` (API billing). Our entire scheduled-tasks framework runs on Claude Max OAuth at zero marginal cost. Switching to the SDK means paying per-token for every scheduled task, issue-worker, and PR-reviewer invocation. This alone makes the SDK non-viable for our current architecture.
|
|
|
|
2. **The CLI covers our needs.** With `--append-system-prompt` (done), `--resume` (this PR), `--json-schema`, and `--allowedTools`, the CLI provides everything we currently need. Session resumption was the last missing piece.
|
|
|
|
3. **Bash scripts are the right abstraction.** Our runners are launched by systemd timers. Bash + CLI is the natural fit — no runtime dependencies, no async event loops, no package management.
|
|
|
|
### When to Revisit
|
|
|
|
- If Anthropic adds OAuth support to the SDK (eliminating the billing difference)
|
|
- If we need dynamic tool approval (e.g., "allow this Bash command but deny that one" at runtime)
|
|
- If we build a long-running Python service that orchestrates multiple Claude sessions (the `ClaudeSDKClient` stateful pattern would be valuable there)
|
|
- If we move to n8n custom nodes written in TypeScript (the TS SDK bundles the CLI binary)
|
|
|
|
### Migration Path (If Needed Later)
|
|
|
|
1. Start with the Python SDK in a single task (e.g., `backlog-triage`) as a proof of concept
|
|
2. Create a thin `sdk-runner.py` wrapper that reads the same `settings.json` and `prompt.md` files
|
|
3. Swap the systemd unit's `ExecStart` from `runner.sh` to `sdk-runner.py`
|
|
4. Expand to other tasks if the POC proves valuable
|