From 7cdb6fcc685a1fbb4315c8fca7e59f04ae75e725 Mon Sep 17 00:00:00 2001 From: Cal Corum Date: Wed, 18 Feb 2026 16:56:43 -0600 Subject: [PATCH] store: Agent Swarm Orchestrator: architecture decisions and lessons learned --- ...chitecture-decisions-and-lessons-99f446.md | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 graph/solutions/agent-swarm-orchestrator-architecture-decisions-and-lessons-99f446.md diff --git a/graph/solutions/agent-swarm-orchestrator-architecture-decisions-and-lessons-99f446.md b/graph/solutions/agent-swarm-orchestrator-architecture-decisions-and-lessons-99f446.md new file mode 100644 index 00000000000..d8b62b68ae1 --- /dev/null +++ b/graph/solutions/agent-swarm-orchestrator-architecture-decisions-and-lessons-99f446.md @@ -0,0 +1,46 @@ +--- +id: 99f4462b-91eb-4243-99cf-7a74a3afadbf +type: solution +title: "Agent Swarm Orchestrator: architecture decisions and lessons learned" +tags: [orchestrator, swarm, claude-code, agents, skills] +importance: 0.9 +confidence: 0.8 +created: "2026-02-18T22:56:43.745740+00:00" +updated: "2026-02-18T22:56:43.745740+00:00" +--- + +## Agent Swarm Orchestrator System + +Built a lightweight orchestration system using Claude Code native primitives. Iterative development over ~6 test runs revealed key architectural constraints. + +### Final Architecture +- **Orchestrator**: Skill at ~/.claude/skills/orchestrator/ (NOT an agent — agents lack Task spawning tool) +- **Workers**: Agent definitions at ~/.claude/agents/swarm-{coder,reviewer,validator}.md +- **Coordination**: Plain blocking Task calls (subagents), NOT agent teams (TeamCreate/SendMessage) + +### Critical Findings + +1. **Agents (--agent) cannot spawn subagents** — the Task tool is not provisioned in agent mode. Orchestrator MUST be a skill running in a normal session. + +2. **Agent Teams (TeamCreate) use async messaging** — team members run in background, requiring sleep polling. Plain Task calls block until completion. Never mix teams with orchestration that needs synchronous results. + +3. **tools allowlist in agent frontmatter** — blocks the Task spawning tool even when listed. Use disallowedTools denylist instead for restricting agents. + +4. **Prompt-only constraints don't stick** — telling the orchestrator don't use Edit/Write/Bash in prose was ignored repeatedly. Technical enforcement (disallowedTools in agent frontmatter, or moving to skill + relying on role design) is necessary for hard constraints. + +5. **Review timing matters** — reviewers must only run AFTER the coders Task call returns. Spawning reviewers for all tasks simultaneously (before later coders finish) causes false REJECTs on incomplete work. + +6. **Wave-based execution order**: For each wave: spawn coders (blocking) -> coders return -> spawn reviewers (blocking) -> reviewers return -> handle fixes -> next wave. After all waves: spawn validator -> report. + +7. **Parallel Task calls**: Multiple Task calls in ONE message run concurrently and all block until complete. No polling, no sleep, no run_in_background needed. + +### File Inventory +- ~/.claude/skills/orchestrator/SKILL.md — orchestrator skill +- ~/.claude/agents/swarm-coder.md — implementation agent (sonnet, bypassPermissions) +- ~/.claude/agents/swarm-reviewer.md — read-only reviewer (sonnet, default) +- ~/.claude/agents/swarm-validator.md — read-only validator (sonnet, default) + +### Remaining Issues +- Orchestrator sometimes edits files itself for cosmetic fixes despite instructions not to +- No PROJECT_PLAN.json integration yet (planned) +- No resume/checkpoint capability if orchestrator context fills up