Multi-Agent
Supervisor-worker delegation across multiple autonomous agents.
Multi-Agent (Delegation / Supervision) — Overview
The multi-agent pattern uses multiple specialized agents coordinated by a supervisor. Each agent has its own tools, prompts, and domain expertise. The supervisor decides which agent to delegate to, interprets results, and orchestrates the overall task.
Evolves from: Orchestrator-Worker + Routing — adds agent-to-agent communication, shared state, and supervisor oversight.
Architecture
Figure: A supervisor agent routes subtasks to specialized worker agents, each with their own tools. Shared state enables agents to read each other's results. The supervisor synthesizes the final output.
How It Works
- Receive — The supervisor agent receives a complex task from the user.
- Analyze — The supervisor reasons about the task and identifies which worker agent(s) are needed.
- Delegate — The supervisor sends focused subtasks to worker agents via tool calls (e.g.,
delegate_to("research_agent", "Find data on X")). - Execute — Worker agents run autonomously, using their specialized tools to complete their subtasks. Each worker is a full agent (typically a ReAct loop).
- Return — Worker results are returned to the supervisor.
- Iterate — The supervisor may delegate additional tasks, refine previous results, or request corrections from workers.
- Synthesize — Once all needed work is done, the supervisor combines results into a final output.
Minimal Example
Produce a technical deep-dive — the supervisor delegates research, writing, and review to three specialized agents.
from patterns.multi_agent.code.python.multi_agent import MultiAgentSystem, SubAgent
# Each sub-agent can itself be a ReActAgent, RAGPipeline, etc.
system = MultiAgentSystem(
supervisor=your_llm,
agents=[
SubAgent(
name="researcher",
description="Finds papers, benchmarks, and technical details",
run=lambda task, ctx: research_agent.run(task).answer,
),
SubAgent(
name="engineer",
description="Writes technical content with code examples",
run=lambda task, ctx: engineer_agent.run(task).answer,
),
SubAgent(
name="editor",
description="Polishes, restructures, and ensures consistency",
run=lambda task, ctx: editor_agent.run(task).answer,
),
],
max_rounds=4,
)
result = system.run(
"Produce a technical deep-dive on LLM inference optimization for a developer audience"
)
# result.delegations → which agents were called and in what order (decided by supervisor)
# result.agent_outputs → each agent's contribution
# result.final_output → synthesized deliverable
Code variants
| Implementation | Language | Path |
|---|---|---|
| Framework-agnostic supervisor + sub-agents (MockLLM) | Python | `code/python/multi_agent.py` |
LangGraph (StateGraph supervisor + conditional edges to role nodes) |
Python | `code/python/langgraph/multi_agent.py` |
CrewAI (Crew + Process.sequential chain of role Agents) |
Python | `code/python/crewai/multi_agent.py` |
Vercel AI SDK (generateObject supervisor decisions, plain sub-agent functions) |
TypeScript | `code/typescript/vercel-ai-sdk/multi-agent.ts` |
Mastra (one Agent per role + supervisor Agent.generate({ output })) |
TypeScript | `code/typescript/mastra/multi-agent.ts` |
All three variants run the same researcher → writer → reviewer delegation against the same enterprise-overview task so they're diff-friendly across stacks. The Mastra variant treats every role as a first-class Agent; the Vercel AI SDK variant leaves sub-agents as plain (task, context) => Promise<string> functions for lower ceremony.
Examples
- Ops crew — concrete domain overlay anchored to the
ops-crewrecipe. Worked schemas forIncidentSignal/TriageDecision/IncidentReport, mock PagerDuty / runbook / Slack adapters, role prompts for triage / runbook_executor / incident_writer, and an end-to-end walkthrough with offline tests in `examples/ops_crew/`.
Input / Output
- Input: A complex task requiring multiple specialized capabilities
- Output: A synthesized result combining work from multiple agents
- Delegation:
{agent: string, task: string, context?: object} - Shared state: Accumulated results accessible to all agents
Key Tradeoffs
| Strength | Limitation |
|---|---|
| Each agent is specialized and focused | High complexity — multiple agents to design, prompt, and debug |
| Naturally handles multi-domain tasks | Cost scales with number of agents and delegation rounds |
| New agents can be added without changing others | Inter-agent communication design is critical and hard |
| Supervisor provides oversight and quality control | Supervisor is a single point of failure |
| Parallelizable when worker tasks are independent | Shared state management adds coordination overhead |
When to Use
- Tasks spanning multiple domains (research + code + writing)
- When different subtasks need different tool sets
- When the system needs distinct "expertise areas"
- Large-scale tasks that benefit from divide-and-conquer
- When you want clear separation of concerns between capabilities
When NOT to Use
- When a single agent with multiple tools suffices — use ReAct
- When the task decomposition is static — use Orchestrator-Worker
- For simple routing without agent autonomy — use Routing
- When the overhead of multiple agents isn't justified by the task complexity
Related Patterns
- Evolves from: Orchestrator-Worker + Routing — see evolution.md
- Workers use: ReAct (each worker runs an agent loop), Tool Use
- Combines with: Memory (shared memory across agents), Plan & Execute (supervisor generates a plan, workers execute steps)
Deeper Dive
- Design — Agent registry, communication protocols, shared state, supervisor prompting, worker design
- Implementation — Pseudocode, delegation mechanics, state management, testing strategies
- Evolution — How multi-agent evolves from orchestrator-worker and routing
When NOT to use this pattern
- One specialized agent can handle the full scope — multi-agent multiplies cost without benefit.
- You haven't yet built and stabilized the single-agent version — multi-agent is harder to debug and tune.
- Worker agents would share most tools and prompts — they're not actually specialized; the topology adds nothing.
Next steps
- Production version: see Blueprints → Deployments for the deployment agents that use this pattern.
- Generate a starter project: see Blueprint → Spec → Scaffold.
- Combine with other patterns: see the Composition guide.
Anti-compositions to watch for
Documented pairings where this pattern often fails in production. Source: composition/anti-compositions.
Multi-Agent + Reflection on small tasks
Multi-agent already carries 3–5× the cost and latency of a single agent. Adding reflection at least doubles per-worker cost and serializes execution. On tasks the simpler patterns handle, you spend 6–10× the budget for marginal quality gain that does not survive eval.
Instead:Pick one. Reflection on a single ReAct agent often delivers 80% of the quality gain at a fraction of the cost. Add multi-agent + reflection only with an eval baseline that justifies the premium.
Memory + Multi-Agent without scoped writes
Memory's failure mode is poisoning. Multi-agent's failure mode is propagation. Composed without per-agent scopes, every worker can poison every other worker, and debugging requires reconstructing a multi-actor write history.
Instead:Give each agent its own memory scope (read-mostly cross-scope), or designate one memory-writer agent that owns all writes through an approval surface.
Routing + Multi-Agent supervisor classifying on the same axis
Two classifiers in series. The router's output becomes the supervisor's input; the supervisor re-derives the same classification. Cost paid twice, failure surface doubled.
Instead:Collapse to one classifier — the router calls workers directly, or the supervisor handles routing as its first internal step. Compose only when the two classifiers operate on different dimensions.
Multi-Agent + Long-Term Memory without provenance tags
A poisoned worker poisons the long-term memory; all future agents inherit the poison. Without provenance there is no way to roll back selectively or down-weight a low-trust source.
Instead:Tag every memory entry with its source (`source: worker_X`). Apply per-source trust scoring and keep an audit log of writes.