Fast-CrewAI
Menu
Issue commentarytoolsseverity: high

Tools get executed repeatedly with identical arguments

LLMs call the same tool with the same arguments over and over. CrewAI has no default caching layer, so every invocation pays the full cost — including the ones that are wasteful.

Neul Labs ·

The symptom

You watch your agent logs during a long task and notice the same tool being called three, four, eight times with the same arguments. Each call spins up a full execution — the HTTP request goes out, the database is queried, the filesystem is scanned — and each call returns the same result. Your bills go up. Your latency goes up. Your agents aren’t doing anything wrong; they’re just asking the same question multiple times because LLMs do that.

Why this happens

LLMs are not memoization-aware. Within a single agent turn, a model might decide to call get_weather("San Francisco") to pick an outfit recommendation, call it again three turns later to double-check, and call it a third time when the final answer is being written. There is no mechanism in CrewAI’s default BaseTool to notice that the arguments are identical and short-circuit the call.

The cost is two-fold. The direct cost is whatever the tool does — an HTTP request, a database query, an API call that costs money. The indirect cost is the framework overhead: argument validation, schema parsing, result serialization, logging. Even for a tool that returns instantly from memory, the framework overhead on every call is non-trivial.

Why this persists upstream

Caching is a correctness hazard. If CrewAI shipped tool caching on by default, every tool with time-sensitive output — stock prices, weather, sensor data, anything querying “now” — would return stale results and surprise users. Shipping caching off by default is safe. Shipping caching that users have to opt into per tool is the right answer, and it’s what most production teams eventually build themselves. Upstream is waiting for the cleanest API surface to emerge before committing to one.

How Fast-CrewAI addresses it

Fast-CrewAI’s BaseTool patch wraps tool execution in a Rust-backed executor with three features:

  1. Result caching with configurable TTL. Opt-in per tool. Mark tools safe to cache, give them a TTL in seconds, and repeated calls with identical arguments return from cache for the duration. Cache keys are content-hashed from the serialized arguments, so order-independence and structural equality work out of the box.
  2. Serde-based JSON validation. Tool arguments are validated against the schema via serde_json instead of Python’s jsonschema. That’s where a big chunk of the framework overhead lives.
  3. Execution statistics. Every tool call is recorded with its arguments, result, cache hit/miss, latency, and error status. This is invaluable for tuning TTLs and spotting tools that would benefit from caching but don’t have it enabled yet.

On synthetic benchmarks with repeated identical calls, the cached path hits 17.3× throughput (11,616 ops/s vs 670 ops/s). Memory usage drops 99% because cached results are returned as pre-serialized bytes instead of rebuilt Python objects.

The caching is opt-in per tool, because we agree with upstream that defaulting it on would be dangerous:

from fast_crewai.tools import cached

@cached(ttl_seconds=300)
class WeatherTool(BaseTool):
    ...

Tools that aren’t marked @cached run exactly as they did before — the Rust executor still handles argument validation and stats, but nothing is cached.

Workaround you can ship today

Even without Fast-CrewAI, you can add caching to individual tools cheaply:

from functools import lru_cache

class WeatherTool(BaseTool):
    @lru_cache(maxsize=128)
    def _run_cached(self, city: str) -> str:
        return self._fetch_weather(city)

    def _run(self, city: str) -> str:
        return self._run_cached(city)

This doesn’t give you TTL or statistics, but it does eliminate the most obvious case. For TTL-based caching without Fast-CrewAI, cachetools.TTLCache is a drop-in replacement for lru_cache.

When it matters

Tool caching is the single biggest practical win for crews that make many tool calls per run. If your agents call the same tools repeatedly and your tools are deterministic or have tolerance for short-staleness, this is usually the first thing to fix. We flag it in almost every performance audit we run.

Need help applying this to your codebase?

Neul Labs offers audits, full implementation, and retained CrewAI engineering. We built fast-crewai — we can build yours.