Fast-CrewAI
Menu
Issue commentaryserializationseverity: medium

JSON serialization dominates agent message passing

Python's json module is the hidden tax on every agent-to-agent handoff. For memory-heavy crews, it can account for 20–40% of CPU time.

Neul Labs ·

The symptom

You profile a long-running crew expecting to see LLM latency, tool execution, and memory retrieval at the top of the flamegraph. Instead, you see json.dumps and json.loads with stubbornly large percentages. They show up everywhere — memory writes, tool results, agent handoffs, structured output parsing — and collectively they add up to a shockingly large fraction of your wall-clock time.

Why this happens

Python’s stdlib json is written in C, but it’s not fast in the way a Rust brain means. It walks Python objects one at a time, checks types against the Python type hierarchy, handles encoding, and allocates intermediate objects constantly. For CrewAI’s payloads — nested dicts with role/content/metadata/tool_result structures — the per-operation cost is in the hundreds of microseconds range.

That sounds small. It isn’t. CrewAI uses JSON for:

  • Persisting every memory write.
  • Serializing every tool call’s arguments and result.
  • Passing structured output between tasks.
  • Logging agent messages.
  • Caching keys for tool executor statistics.

In a memory-heavy workflow, serialization calls are in the tens of thousands per run. The cumulative cost gets big fast.

Why this persists upstream

json is in the stdlib, has zero install friction, and works everywhere. A faster serializer — orjson, rapidjson, simdjson — is a dependency decision, and dependencies are politics. CrewAI has historically prioritized a minimal dependency set, which is defensible. The cost of that decision is absorbed by individual teams as “CrewAI is slow at the margins”.

How Fast-CrewAI addresses it

Fast-CrewAI replaces the serialization paths inside CrewAI’s internal components with serde_json exposed via PyO3. Your application code still uses Python’s json if that’s what it calls — we don’t patch the stdlib. But every time CrewAI internally serializes a message, a memory entry, a tool result, or an agent payload, the work runs through serde.

On a representative agent-message payload:

  • Python json: ~2,333 ops/s
  • serde via PyO3: ~80,525 ops/s
  • Ratio: 34.5×

Peak memory drops 58% on the same payload, because serde can write directly to bytes without constructing intermediate Python dict or list objects. For a long-running worker processing thousands of serialization calls a minute, that memory delta is the difference between staying under a container limit and getting OOM-killed.

There’s a subtle second benefit: because serde-based validation is faster than Python’s jsonschema, the tool executor’s argument validation gets a free speedup too. That compounds with the explicit tool caching gains.

Workaround you can ship today

If you can’t adopt Fast-CrewAI, you can patch the stdlib yourself in application code:

import orjson
import json as stdlib_json

def _dumps(obj, **kwargs):
    return orjson.dumps(obj).decode()

def _loads(s, **kwargs):
    return orjson.loads(s)

stdlib_json.dumps = _dumps  # type: ignore
stdlib_json.loads = _loads  # type: ignore

This is blunt — it monkey-patches the whole stdlib — and it has edge cases around parameters orjson doesn’t support (sort_keys, indent, etc.). But for internal CrewAI serialization it’s usually safe, and it gets you a ~10× improvement on the hot path. Fast-CrewAI’s 34× is better because it also avoids the PyO3 boundary crossing on every call, but orjson monkey-patching is a reasonable halfway point.

When it matters

Memory-heavy or tool-heavy workflows where you see json in your profiler’s top 20. If you profile your crew and the serialization cost isn’t visible, don’t worry about it. If it is, this is one of the cheapest wins available.

Need help applying this to your codebase?

Neul Labs offers audits, full implementation, and retained CrewAI engineering. We built fast-crewai — we can build yours.