claude-home/scheduled-tasks/agent-sdk-evaluation.md
Cal Corum 0dae52441a
All checks were successful
Auto-merge docs-only PRs / auto-merge-docs (pull_request) Successful in 3s
feat: add session resumption and Agent SDK evaluation
- runner.sh: opt-in session persistence via session_resumable and
  resume_last_session settings; fix read_setting to normalize booleans
- issue-poller.sh: capture and log session_id from worker invocations,
  include in result JSON
- pr-reviewer-dispatcher.sh: capture and log session_id from reviews
- n8n workflow: add --append-system-prompt to initial SSH node, add
  Follow Up Diagnostics node using --resume for deeper investigation,
  update Discord Alert with remediation details
- Add Agent SDK evaluation doc (CLI vs Python/TS SDK comparison)
- Update CONTEXT.md with session resumption documentation

Closes #3

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:56:32 -05:00

7.7 KiB

title description type domain tags
Agent SDK Evaluation — CLI vs Python/TypeScript SDK Comparison of Claude Code CLI invocation (claude -p) vs the native Agent SDK for programmatic use in the headless-claude and claude-scheduled systems. context scheduled-tasks
claude-code
sdk
agent-sdk
python
typescript
headless
automation
evaluation

Agent SDK Evaluation: CLI vs Python/TypeScript SDK

Date: 2026-04-03 Status: Evaluation complete — recommendation below Related: Issue #3 (headless-claude: Additional Agent SDK improvements)

1. Current Approach — CLI via claude -p

All headless Claude invocations use the CLI subprocess pattern:

claude -p "<prompt>" \
  --model sonnet \
  --output-format json \
  --allowedTools "Read,Grep,Glob" \
  --append-system-prompt "..." \
  --max-budget-usd 2.00

Pros:

  • Simple to invoke from any language (bash, n8n SSH nodes, systemd units)
  • Uses Claude Max OAuth — no API key needed, no per-token billing
  • Mature and battle-tested in our scheduled-tasks framework
  • CLAUDE.md and settings.json are loaded automatically
  • No runtime dependencies beyond the CLI binary

Cons:

  • Structured output requires parsing JSON from stdout
  • Error handling is exit-code-based with stderr parsing
  • No mid-stream observability (streaming requires JSONL parsing)
  • Tool approval is allowlist-only — no dynamic per-call decisions
  • Session resumption requires manual --resume flag plumbing

2. Python Agent SDK

Package: claude-agent-sdk (renamed from claude-code) Install: pip install claude-agent-sdk Requires: Python 3.10+, ANTHROPIC_API_KEY env var

from claude_agent_sdk import query, ClaudeAgentOptions

async for message in query(
    prompt="Diagnose server health",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Grep", "Bash(python3 *)"],
        output_format={"type": "json_schema", "schema": {...}},
        max_budget_usd=2.00,
    ),
):
    if hasattr(message, "result"):
        print(message.result)

Key features:

  • Async generator with typed SDKMessage objects (User, Assistant, Result, System)
  • ClaudeSDKClient for stateful multi-turn conversations
  • can_use_tool callback for dynamic per-call tool approval
  • In-process hooks (PreToolUse, PostToolUse, Stop, etc.)
  • rewindFiles() to restore filesystem to any prior message point
  • Typed exception hierarchy (CLINotFoundError, ProcessError, etc.)

Limitation: Shells out to the Claude Code CLI binary — it is NOT a pure HTTP client. The binary must be installed.

3. TypeScript Agent SDK

Package: @anthropic-ai/claude-agent-sdk (renamed from @anthropic-ai/claude-code) Install: npm install @anthropic-ai/claude-agent-sdk Requires: Node 18+, ANTHROPIC_API_KEY env var

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Diagnose server health",
  options: {
    allowedTools: ["Read", "Grep", "Bash(python3 *)"],
    maxBudgetUsd: 2.00,
  }
})) {
  if ("result" in message) console.log(message.result);
}

Key features (superset of Python):

  • Same async generator pattern
  • "auto" permission mode (model classifier per tool call) — TS-only
  • spawnClaudeCodeProcess hook for remote/containerized execution
  • setMcpServers() for dynamic MCP server swapping mid-session
  • V2 preview: send() / stream() patterns for simpler multi-turn
  • Bundles the Claude Code binary — no separate install needed

4. Comparison Matrix

Capability claude -p CLI Python SDK TypeScript SDK
Auth OAuth (Claude Max) API key only API key only
Invocation Shell subprocess Async generator Async generator
Structured output --json-schema flag Schema in options Schema in options
Streaming JSONL parsing Typed messages Typed messages
Tool approval --allowedTools only can_use_tool callback canUseTool callback + auto mode
Session resume --resume flag resume: sessionId resume: sessionId
Cost tracking Parse result JSON ResultMessage.total_cost_usd Same + per-model breakdown
Error handling Exit codes + stderr Typed exceptions Typed exceptions
Hooks External shell scripts In-process callbacks In-process callbacks
Custom tools Not available tool() decorator tool() + Zod schemas
Subagents Not programmatic agents option agents option
File rewind Not available rewindFiles() rewindFiles()
MCP servers --mcp-config file Inline config object Inline + dynamic swap
CLAUDE.md loading Automatic Must opt-in (settingSources) Must opt-in
Dependencies CLI binary CLI binary + Python Node 18+ (bundles CLI)

5. Integration Paths

A. n8n Code Nodes

The n8n Code node supports JavaScript (not TypeScript directly, but the SDK's JS output works). This would replace the current SSH → CLI pattern:

Schedule Trigger → Code Node (JS, uses SDK) → IF → Discord

Trade-off: Eliminates the SSH hop to CT 300, but requires ANTHROPIC_API_KEY and n8n to have the npm package installed. Current n8n runs in a Docker container on CT 210 — would need the SDK and CLI binary in the image.

B. Standalone Python Scripts

Replace claude -p subprocess calls in custom dispatchers with the Python SDK:

# Instead of: subprocess.run(["claude", "-p", prompt, ...])
async for msg in query(prompt=prompt, options=opts):
    ...

Trade-off: Richer error handling and streaming, but our dispatchers are bash scripts, not Python. Would require rewriting runner.sh and dispatchers in Python.

C. Systemd-triggered Tasks (Current Architecture)

Keep systemd timers → bash scripts, but optionally invoke a thin Python wrapper that uses the SDK instead of claude -p directly.

Trade-off: Adds Python as a dependency for scheduled tasks that currently only need bash + the CLI binary. Marginal benefit unless we need hooks or dynamic tool approval.

6. Recommendation

Stay with CLI invocation for now. Revisit the Python SDK when we need dynamic tool approval or in-process hooks.

Rationale

  1. Auth is the blocker. The SDK requires ANTHROPIC_API_KEY (API billing). Our entire scheduled-tasks framework runs on Claude Max OAuth at zero marginal cost. Switching to the SDK means paying per-token for every scheduled task, issue-worker, and PR-reviewer invocation. This alone makes the SDK non-viable for our current architecture.

  2. The CLI covers our needs. With --append-system-prompt (done), --resume (this PR), --json-schema, and --allowedTools, the CLI provides everything we currently need. Session resumption was the last missing piece.

  3. Bash scripts are the right abstraction. Our runners are launched by systemd timers. Bash + CLI is the natural fit — no runtime dependencies, no async event loops, no package management.

When to Revisit

  • If Anthropic adds OAuth support to the SDK (eliminating the billing difference)
  • If we need dynamic tool approval (e.g., "allow this Bash command but deny that one" at runtime)
  • If we build a long-running Python service that orchestrates multiple Claude sessions (the ClaudeSDKClient stateful pattern would be valuable there)
  • If we move to n8n custom nodes written in TypeScript (the TS SDK bundles the CLI binary)

Migration Path (If Needed Later)

  1. Start with the Python SDK in a single task (e.g., backlog-triage) as a proof of concept
  2. Create a thin sdk-runner.py wrapper that reads the same settings.json and prompt.md files
  3. Swap the systemd unit's ExecStart from runner.sh to sdk-runner.py
  4. Expand to other tasks if the POC proves valuable