Issue commentarygeneralseverity: medium

Observability is retrofittable but not first-class

CrewAI runs produce logs, but the structured spans you'd need for distributed tracing have to be added by hand. When things slow down in production, you're guessing.

Neul Labs · April 12, 2026

The symptom

A crew run takes longer than usual. You want to know which agent, which task, which tool call, which memory lookup, which LLM call caused the regression. You open your logs and find a sequential stream of text that’s readable but not queryable. There’s no trace tree, no span durations, no way to ask “what changed between yesterday’s P95 and today’s P95?”.

Why this happens

CrewAI’s default logging is text-based. It’s good for development — you can print-debug your way through a tricky crew — and it is not structured enough for production observability. There are no built-in OpenTelemetry spans around agent turns, tool invocations, or memory operations. If you want those, you add them yourself.

The problem isn’t that it’s impossible. It’s that the hooks aren’t obvious and the correct spans aren’t standardized, so every team ends up writing its own slightly different instrumentation and none of them are portable.

Why this persists upstream

OpenTelemetry is a heavy dependency to make default, and opinionated span naming would annoy teams that already have their own conventions. The upstream answer is “yes, please instrument it yourself” — which is correct, and also means most teams don’t.

How Fast-CrewAI addresses it

Fast-CrewAI doesn’t ship OpenTelemetry integration, but it’s designed to be observability-friendly:

Span context propagates through PyO3. When your Python code creates an OTel span and then calls into Fast-CrewAI’s Rust path, the context survives the boundary crossing. You can wrap the Rust-accelerated calls in spans and they’ll nest correctly in your trace.
Tool execution statistics are first-class. Every tool call is recorded with arguments, result, cache hit/miss, latency, and error status. You can attach this data to your spans as attributes without adding extra instrumentation.
The task executor exposes dependency graph metadata. You can tag each task span with its position in the DAG, its dependencies, and whether it ran in parallel or sequentially.

The practical upshot is that the Rust extension doesn’t erase the observability signal — it preserves it — and you can build clean OTel instrumentation on top of it with far less ceremony.

Workaround you can ship today

The workaround is the same whether you adopt Fast-CrewAI or not: write your own OpenTelemetry instrumentation. The shape we recommend:

from opentelemetry import trace
tracer = trace.get_tracer("crewai")

class InstrumentedTool(BaseTool):
    def _run(self, *args, **kwargs):
        with tracer.start_as_current_span(
            f"tool.{self.name}",
            attributes={"args": str(args), "kwargs": str(kwargs)},
        ) as span:
            try:
                result = super()._run(*args, **kwargs)
                span.set_attribute("result.length", len(str(result)))
                return result
            except Exception as e:
                span.record_exception(e)
                raise

Apply the same pattern to agents, tasks, memory lookups, and LLM calls. Once you have it, production debugging becomes a different job.

When it matters

Any CrewAI deployment that has to stay up under business pressure. Observability is the single biggest lever for diagnosing production slowness fast — and therefore the biggest lever for knowing whether Fast-CrewAI (or any optimization) is helping. We include tracing setup as part of every implementation sprint.