Fast-CrewAI
Menu
Issue commentarydatabaseseverity: medium

Concurrent workers fight over a single SQLite connection

CrewAI's default memory backend uses a single SQLite connection per process. Under concurrent load, workers serialize through it and throughput collapses.

Neul Labs ·

The symptom

Your crew works beautifully at low concurrency. You scale up to a worker pool running multiple crews in parallel, and throughput barely improves — or even gets worse. You check CPU; it’s fine. You check memory; it’s fine. You check your SQLite file and notice that writes are backing up. Something is serializing access to the database.

Why this happens

SQLite supports concurrent reads through WAL mode, but writes are serialized at the file level. CrewAI’s default memory backend opens a single connection per process and reuses it for everything — reads, writes, schema operations. Under concurrent workloads, all the workers in a single process end up waiting on that one connection.

The per-connection cost isn’t fatal by itself. The real problem is that short-lived operations are paying the full cost of acquiring the connection lock, even when most of them are reads that could have run in parallel with each other. Under concurrent load the throughput floor drops well below what SQLite is actually capable of.

Why this persists upstream

Connection pooling in Python is a little awkward. There’s no great stdlib option — you either write your own pool or pull in a dependency. SQLAlchemy has pooling but it’s a heavy addition. aiosqlite exists but requires an async codebase. The safest default is a single connection, and that’s what most Python SQLite code ships with.

How Fast-CrewAI addresses it

Fast-CrewAI uses r2d2, a Rust connection pool crate, inside its database executor. The pool sizes itself based on CPU count by default and can be tuned via environment variables. Reads and writes go through the pool; WAL mode is enabled automatically on pool initialization.

On concurrent workloads, throughput scales much more linearly with worker count, and tail latencies stop being dominated by lock contention. The overall speedup depends heavily on concurrency — at 1 worker it’s essentially neutral, at 4 workers it’s meaningful, at 16 workers it’s dramatic.

The pooled connections also make FTS5 queries cheaper in practice, because the index maintenance triggers run on whatever connection happens to be free rather than blocking the main one.

Workaround you can ship today

Without Fast-CrewAI, you have two options:

  1. Enable WAL mode yourself. If you’re not on it already, PRAGMA journal_mode=WAL is a one-liner that dramatically improves concurrent read throughput.
  2. Implement a small connection pool by hand. Something like 4–8 connections per process, round-robined per query, with a lock on the writer subset. It’s a hundred lines of code and it moves the ceiling significantly.

For any serious concurrent workload, do at least option 1 immediately.

When it matters

Any CrewAI deployment running more than a handful of concurrent crews per process. If you’ve scaled your worker pool and throughput didn’t scale with it, this is one of the first things to check. Included in every performance audit we run.

Need help applying this to your codebase?

Neul Labs offers audits, full implementation, and retained CrewAI engineering. We built fast-crewai — we can build yours.