claude-memory/graph/solutions/agent-swarm-orchestrator-architecture-decisions-and-lessons-99f446.md
2026-03-01 16:02:58 -06:00

2.8 KiB

id type title tags importance confidence created updated
99f4462b-91eb-4243-99cf-7a74a3afadbf solution Agent Swarm Orchestrator: architecture decisions and lessons learned
orchestrator
swarm
claude-code
agents
skills
0.9 0.8 2026-02-18T22:56:43.745740+00:00 2026-03-01T22:02:57.280383+00:00

Agent Swarm Orchestrator System

Built a lightweight orchestration system using Claude Code native primitives. Iterative development over ~6 test runs revealed key architectural constraints.

Final Architecture

  • Orchestrator: Skill at ~/.claude/skills/orchestrator/ (NOT an agent — agents lack Task spawning tool)
  • Workers: Agent definitions at ~/.claude/agents/swarm-{coder,reviewer,validator}.md
  • Coordination: Plain blocking Task calls (subagents), NOT agent teams (TeamCreate/SendMessage)

Critical Findings

  1. Agents (--agent) cannot spawn subagents — the Task tool is not provisioned in agent mode. Orchestrator MUST be a skill running in a normal session.

  2. Agent Teams (TeamCreate) use async messaging — team members run in background, requiring sleep polling. Plain Task calls block until completion. Never mix teams with orchestration that needs synchronous results.

  3. tools allowlist in agent frontmatter — blocks the Task spawning tool even when listed. Use disallowedTools denylist instead for restricting agents.

  4. Prompt-only constraints don't stick — telling the orchestrator don't use Edit/Write/Bash in prose was ignored repeatedly. Technical enforcement (disallowedTools in agent frontmatter, or moving to skill + relying on role design) is necessary for hard constraints.

  5. Review timing matters — reviewers must only run AFTER the coders Task call returns. Spawning reviewers for all tasks simultaneously (before later coders finish) causes false REJECTs on incomplete work.

  6. Wave-based execution order: For each wave: spawn coders (blocking) -> coders return -> spawn reviewers (blocking) -> reviewers return -> handle fixes -> next wave. After all waves: spawn validator -> report.

  7. Parallel Task calls: Multiple Task calls in ONE message run concurrently and all block until complete. No polling, no sleep, no run_in_background needed.

File Inventory

  • ~/.claude/skills/orchestrator/SKILL.md — orchestrator skill
  • ~/.claude/agents/swarm-coder.md — implementation agent (sonnet, bypassPermissions)
  • ~/.claude/agents/swarm-reviewer.md — read-only reviewer (sonnet, default)
  • ~/.claude/agents/swarm-validator.md — read-only validator (sonnet, default)

Remaining Issues

  • Orchestrator sometimes edits files itself for cosmetic fixes despite instructions not to
  • No PROJECT_PLAN.json integration yet (planned)
  • No resume/checkpoint capability if orchestrator context fills up