Workflow

Parallel Calls

Concurrent LLM calls on independent inputs, aggregated at the end.

Beginner · Evolves into: RAG , Routing

Parallel Calls (Fan-out / Fan-in) — Overview

Parallel calls execute multiple LLM requests simultaneously on independent inputs, then aggregate the results. This pattern trades sequential simplicity for throughput.

Architecture

graph TD Input([Input]) -->|"task or data"| Splitter[Splitter:<br/>Divide work] Splitter -->|"chunk_1"| LLM1[LLM Call 1] Splitter -->|"chunk_2"| LLM2[LLM Call 2] Splitter -->|"chunk_3"| LLM3[LLM Call 3] LLM1 -->|"result_1"| Aggregator[Aggregator:<br/>Combine results] LLM2 -->|"result_2"| Aggregator LLM3 -->|"result_3"| Aggregator Aggregator -->|"merged output"| Output([Output]) style Input fill:#e3f2fd style Splitter fill:#fff8e1 style LLM1 fill:#fff3e0 style LLM2 fill:#fff3e0 style LLM3 fill:#fff3e0 style Aggregator fill:#e8f5e9 style Output fill:#e3f2fd

Figure: Fan-out / fan-in pattern. A splitter divides work into independent chunks, LLM calls process them concurrently, and an aggregator merges the results.

How It Works

  1. Split — Divide the input into independent units of work. This can be data-parallel (same prompt, different data) or task-parallel (different prompts, same data).
  2. Fan-out — Send all LLM calls concurrently. Since they're independent, order doesn't matter.
  3. Fan-in — Collect all results. Handle partial failures (some calls may fail while others succeed).
  4. Aggregate — Combine results into the final output. The aggregation step may itself be an LLM call (summarize, synthesize) or code-based (concatenate, merge, vote).

Minimal Example

Evaluate four candidate resumes concurrently, then aggregate into a ranked recommendation — all in parallel.

from workflows.parallel_calls.code.python.parallel_calls import ParallelCalls

runner = ParallelCalls(llm=your_llm, max_workers=4)

result = runner.run(
    chunks=resume_texts,          # one string per candidate resume
    branch_prompt=(
        "Score this resume for a senior Python engineer role (0–10) "
        "with a one-paragraph justification:\n\n{input}"
    ),
    aggregate_prompt=(
        "Rank these candidates from best to worst and recommend the top 2:\n\n{input}"
    ),
)

# result.outputs     → individual scores, ordered by input index
# result.aggregated  → final ranked recommendation
# result.errors      → any branches that failed

Full implementation: [`code/python/parallel_calls.py`](code/python/parallel_calls.py)

Input / Output

  • Input: Data or task that can be divided into independent parts
  • Output: Aggregated result combining all parallel outputs
  • Fan-out: N independent LLM calls (N determined by the splitter)
  • Fan-in: N results collected, potentially with failures

Key Tradeoffs

Strength Limitation
Dramatically lower latency for parallelizable work Only works for independent subtasks
Scales naturally with available concurrency Aggregation can be complex (especially with partial failures)
Each call has a focused prompt Higher peak token cost (all calls active simultaneously)
Partial failure isolation — one call failing doesn't block others Results may be inconsistent across parallel calls
Simple to reason about — no inter-call dependencies Splitting logic must ensure true independence

When to Use

  • Processing multiple documents, chunks, or data points with the same analysis
  • Extracting different aspects of a single input in parallel (sentiment, entities, summary)
  • Generating multiple candidate outputs for downstream selection
  • Any task where subtasks don't depend on each other's results
  • Voting/consensus patterns where multiple LLM calls vote on an answer

When NOT to Use

  • Evolves into: RAG (parallel retrieval + context injection), Routing (add LLM-driven classification before fan-out)
  • Combines with: Prompt Chaining (parallelize independent steps within a chain), Evaluator-Optimizer (evaluate parallel outputs)
  • More sophisticated version: Orchestrator-Worker (when splitting requires LLM reasoning)

Deeper Dive

  • Design — Splitting strategies, aggregation patterns, partial failure handling
  • Implementation — Pseudocode, concurrency management, testing with stubs