Skip to main content

Fan-Out with Child Workflows

TLDR

Split your record set into fixed-size chunks and start one child Workflow per chunk so that each chunk's history stays within Temporal's limits. Use this when you want maximum concurrency with no rate control and you can pre-compute how many chunks you need before the job starts. Keep the total number of children per parent under 1,000; use Sliding Window or Batch Iterator for larger workloads.

Overview

The Fan-Out pattern distributes a large record set across multiple independent child Workflows, each responsible for processing a fixed-size chunk. The parent Workflow assigns work by offset and length so that no record IDs need to be passed over the wire — only two integers per child.

Problem

A single Workflow run can have at most 2,000 in-flight Activities (aim for 500) and at most 50,000 history events. Processing millions of records in a single Workflow run is therefore not possible.

You need a way to partition a large record set, process each partition independently, and coordinate the overall job while keeping each Workflow's history within safe bounds.

Solution

You split the total record count into fixed-size chunks and start one child Workflow per chunk. Each child is given an offset and a length so it knows which slice of the record set to fetch and process independently.

The parent Workflow starts all children concurrently and waits for them all to complete. If a child fails the parent can retry that child without re-processing the records handled by other children.

The following describes each step in the diagram:

  1. The parent Workflow receives the total record count and a configured chunk size.
  2. It divides the total into chunks and starts one child Workflow per chunk, passing only offset and length.
  3. Each child independently fetches its slice of records (using the offset and length) and calls processRecord for each one.
  4. Each child completes and returns its result to the parent.
  5. The parent blocks until all children have completed, then returns the aggregated result.

Implementation

The following examples show how each SDK implements the Fan-Out pattern.

// workflows.ts
import {
executeChild,
proxyActivities,
workflowInfo,
} from "@temporalio/workflow";
import type * as activities from "./activities";
import { TASK_QUEUE, CHUNK_SIZE } from "./shared";

const { processRecord } = proxyActivities<typeof activities>({
startToCloseTimeout: "10 seconds",
});

export async function fanOutWorkflow(
totalRecords: number,
chunkSize: number = CHUNK_SIZE
): Promise<number> {
const children: Promise<number>[] = [];

for (let offset = 0; offset < totalRecords; offset += chunkSize) {
const length = Math.min(chunkSize, totalRecords - offset);
children.push(
executeChild(recordBatchWorkflow, {
args: [offset, length],
taskQueue: TASK_QUEUE,
workflowId: `${workflowInfo().workflowId}/batch-${offset}`,
})
);
}

const results = await Promise.all(children);
return results.reduce((sum, n) => sum + n, 0);
}

export async function recordBatchWorkflow(
offset: number,
length: number
): Promise<number> {
let processed = 0;
for (let i = offset; i < offset + length; i++) {
await processRecord(i);
processed++;
}
return processed;
}

Best Practices

  • Use offset and length, not explicit IDs. Pass only two integers to each child rather than a full slice of IDs. The child fetches its own records. This keeps history events small.
  • Size chunks to stay under the Activity limit. Each child Workflow can have at most 2,000 in-flight Activities. Aim for chunks of 500 records or fewer if each record maps to one Activity.
  • Cap concurrent children in the parent. Starting thousands of child Workflows simultaneously puts pressure on the namespace. Consider batching child starts or using Sliding Window if you need tighter concurrency control.
  • Set PARENT_CLOSE_POLICY_ABANDON for fire-and-forget fan-outs where the parent does not need to collect results. With the default TERMINATE policy, cancelling or timing out the parent will terminate all in-flight children.
  • Give each child a deterministic Workflow ID (parentId/batch-<offset>). This makes it safe to re-run the parent: Temporal deduplicates child starts by Workflow ID, so already-completed children are not re-executed.

Common Pitfalls

  • Starting too many children at once. Each child start adds to the parent's history. Keep total children per parent under 1,000 per Temporal guidance. If you need more children, switch to MapReduce Tree or Sliding Window.
  • Passing large lists of IDs. Workflow inputs are stored in event history. Passing millions of record IDs as a list will blow the history size limit. Use offset + length instead.
  • Ignoring child failures. A failed child does not automatically fail the parent unless you await all results. Always await child handles and handle errors explicitly.