JSON serialization dominates agent message passing
Python's json module is the hidden tax on every agent-to-agent handoff. For memory-heavy crews, it can account for 20–40% of CPU time.
The symptom
You profile a long-running crew expecting to see LLM latency, tool execution, and memory retrieval at the top of the flamegraph. Instead, you see json.dumps and json.loads with stubbornly large percentages. They show up everywhere — memory writes, tool results, agent handoffs, structured output parsing — and collectively they add up to a shockingly large fraction of your wall-clock time.
Why this happens
Python’s stdlib json is written in C, but it’s not fast in the way a Rust brain means. It walks Python objects one at a time, checks types against the Python type hierarchy, handles encoding, and allocates intermediate objects constantly. For CrewAI’s payloads — nested dicts with role/content/metadata/tool_result structures — the per-operation cost is in the hundreds of microseconds range.
That sounds small. It isn’t. CrewAI uses JSON for:
- Persisting every memory write.
- Serializing every tool call’s arguments and result.
- Passing structured output between tasks.
- Logging agent messages.
- Caching keys for tool executor statistics.
In a memory-heavy workflow, serialization calls are in the tens of thousands per run. The cumulative cost gets big fast.
Why this persists upstream
json is in the stdlib, has zero install friction, and works everywhere. A faster serializer — orjson, rapidjson, simdjson — is a dependency decision, and dependencies are politics. CrewAI has historically prioritized a minimal dependency set, which is defensible. The cost of that decision is absorbed by individual teams as “CrewAI is slow at the margins”.
How Fast-CrewAI addresses it
Fast-CrewAI replaces the serialization paths inside CrewAI’s internal components with serde_json exposed via PyO3. Your application code still uses Python’s json if that’s what it calls — we don’t patch the stdlib. But every time CrewAI internally serializes a message, a memory entry, a tool result, or an agent payload, the work runs through serde.
On a representative agent-message payload:
- Python
json: ~2,333 ops/s - serde via PyO3: ~80,525 ops/s
- Ratio: 34.5×
Peak memory drops 58% on the same payload, because serde can write directly to bytes without constructing intermediate Python dict or list objects. For a long-running worker processing thousands of serialization calls a minute, that memory delta is the difference between staying under a container limit and getting OOM-killed.
There’s a subtle second benefit: because serde-based validation is faster than Python’s jsonschema, the tool executor’s argument validation gets a free speedup too. That compounds with the explicit tool caching gains.
Workaround you can ship today
If you can’t adopt Fast-CrewAI, you can patch the stdlib yourself in application code:
import orjson
import json as stdlib_json
def _dumps(obj, **kwargs):
return orjson.dumps(obj).decode()
def _loads(s, **kwargs):
return orjson.loads(s)
stdlib_json.dumps = _dumps # type: ignore
stdlib_json.loads = _loads # type: ignore
This is blunt — it monkey-patches the whole stdlib — and it has edge cases around parameters orjson doesn’t support (sort_keys, indent, etc.). But for internal CrewAI serialization it’s usually safe, and it gets you a ~10× improvement on the hot path. Fast-CrewAI’s 34× is better because it also avoids the PyO3 boundary crossing on every call, but orjson monkey-patching is a reasonable halfway point.
When it matters
Memory-heavy or tool-heavy workflows where you see json in your profiler’s top 20. If you profile your crew and the serialization cost isn’t visible, don’t worry about it. If it is, this is one of the cheapest wins available.