Issue commentarytasksseverity: medium

Independent tasks run sequentially when they could run in parallel

CrewAI's default task scheduler runs tasks in order, even when their dependencies don't require it. Teams leave multi-minute parallelism wins on the table without realizing.

Neul Labs · April 12, 2026

The symptom

Your crew has eight tasks. Three of them feed into each other; the other five are genuinely independent — different subjects, different tools, different outputs. You run the crew and notice it takes eight times as long as the slowest task, not the sum-of-dependent-chains you’d expect. The independent tasks are running one after another for no apparent reason.

Why this happens

CrewAI executes tasks in the order they appear in the tasks=[...] list by default. There’s an async_execution flag and a Process.hierarchical process type for more advanced cases, but the default is linear and naive — it doesn’t build a dependency DAG, doesn’t topologically sort, doesn’t detect independent subgraphs. If you want parallelism, you have to explicitly structure your crew to ask for it.

This is fine for small crews. It gets expensive for crews with many tasks, each of which makes an LLM call, because every serialized task adds its LLM latency to wall-clock time. A crew with eight 10-second LLM-bound tasks takes 80 seconds sequentially and could take 30 seconds with reasonable parallelism.

Why this persists upstream

Automatic parallelism is hard to get right. A task that looks independent might have a hidden dependency through a shared tool, a shared memory reference, or an implicit assumption about ordering. Parallelizing carelessly would break more users than it helps. The safest default is sequential, and the upstream conversation about smarter defaults is ongoing.

How Fast-CrewAI addresses it

Fast-CrewAI’s task executor is backed by Tokio and builds an explicit dependency graph from your Task objects. It:

Detects dependencies from the context field on each task (CrewAI’s existing mechanism for saying “this task depends on that one”).
Topologically sorts the graph to determine an execution order.
Detects cycles and fails loudly if any exist, instead of deadlocking or looping.
Dispatches tasks in parallel where their dependencies allow, using Tokio’s async runtime inside the Rust extension.

You still write tasks the same way — the scheduler just gets smarter. Tasks with no declared dependencies run concurrently; tasks with dependencies wait for their prerequisites.

The honest caveat: Tokio cannot make LLM calls themselves faster. What it buys you is running multiple LLM-bound tasks in parallel instead of serially. That’s a real win on workflows with genuine independent work and a no-op on workflows that are inherently sequential.

Workaround you can ship today

If you don’t want to adopt Fast-CrewAI for this, you can get most of the benefit by hand:

Identify the independent task groups in your crew. Be honest — not every task that “feels” independent actually is.
Split your crew into phases. Run each phase as a separate Crew.kickoff() wrapped in asyncio.gather or concurrent.futures.ThreadPoolExecutor.
Pass the outputs of phase N as inputs to phase N+1.

It’s uglier than declarative dependencies but works with stock CrewAI.

When it matters

Crews with 5+ tasks where the dependency structure has genuine independent subgraphs. If you’re running long tasks in sequence and the DAG is actually sparse, this is a straightforward multi-minute win per run. We check for it in every performance audit.