Tracing Overview
CubePi emits OpenTelemetry spans that follow the GenAI Semantic Conventions v1.41 so any OTel-compatible backend (Jaeger, Tempo, Honeycomb, Datadog, AWS X-Ray, Azure Monitor, …) can ingest agent runs without custom instrumentation.
Attach a Tracer to an Agent and every prompt produces a tree of spans you
can pivot, query, and join with the rest of your service traces:
trace
└── invoke_agent 14425.8ms [0x1cd97cdb] ← one per agent.prompt()
├── cubepi.turn 1283.1ms [0x5cfda93e] ← one per LLM round-trip
│ ├── chat deepseek-v4-flash 1208.7ms tok 6845/68 [0x0d130229]
│ └── execute_tool subagent 9610.2ms subagent [0x38bdd10a]
│ └── invoke_agent 9601.0ms [0x8094f99b] ← subagent run, nested
│ └── cubepi.turn 9598.4ms [0x57c5cfc7]
│ ├── chat deepseek-v4-flash 1190.3ms [0x8205ca6b]
│ └── execute_tool web_search 6500.2ms web_search [0xca4e59fc]
└── cubepi.turn 491.9ms ERROR [0xce25f242]
└── chat deepseek-v4-flash 427.2ms ERROR [0x0bff68ec]
└── error: Error code: 400 - ... `tool_use` ids were found without
`tool_result` blocks immediately after: call_01_...
Each layer carries standard gen_ai.* attributes — gen_ai.operation.name,
gen_ai.request.model, gen_ai.provider.name, gen_ai.usage.input_tokens,
gen_ai.usage.output_tokens, gen_ai.response.finish_reasons, …
What ships out of the box
- Tracer — builds an SDK
TracerProvider, attaches oneBatchSpanProcessorper exporter, wires the CubePi event stream into spans. - Meter — sibling for OTel histograms:
gen_ai.client.operation.duration,gen_ai.client.operation.time_to_first_chunk,gen_ai.client.token.usage. - JsonlSpanExporter — write one JSON line per span to
./cubepi-traces/<date>/<trace_id>.jsonl. Files are sharded bytrace_id, so one file holds a whole trace — the run plus any nested subagent runs (which inherit the trace). Useful for local dev and offline debugging; works with any OTel viewer that reads JSONL, and with thecubepi traceCLI. - OTLP — bring your own exporter via
opentelemetry-exporter-otlp-proto-http(HTTP) or…-grpcand hand it toTracer(exporters=[…]). - W3C trace context propagation — outgoing MCP calls inject the active
traceparentas an HTTP header so an instrumented MCP server can continue the trace. tracer.attached(agent)/meter.attached(agent)— async context managers that RAII-wrap attach/detach, so cleanup is oneasync withblock instead of an explicittry/finally.atexitflush hook —Tracer(atexit_flush=True)(default) registers a process-exit handler that sync-flushes any buffered spans, so callers who forgetawait tracer.shutdown()still get their spans exported on normal exit / Ctrl-C / unhandled exception.tracing_context()— set per-run tags and metadata (cubepi.tags = ("beta-arm",),cubepi.metadata.user_id = "u-42") via a contextvar-scoped block. Concurrent agents each see their own values.- Middleware-owned providers traced automatically — middleware that
exposes extra
BaseProviderinstances viaMiddleware.providers()has its listener registry wired byRecorder.attach()so those providers'chatspans land in the same trace as the agent's main call.CompactionMiddlewareuses this to surface its summarizer LLM call as achat <summary-model>span nested undercubepi.compaction.summarize— see the compaction guide.
What it costs
- One pure-Python recorder per agent run subscribing to the agent's event stream and the provider's listener registry — no monkey-patching, no extra threads.
- One OTel SDK span per layer above.
BatchSpanProcessorbatches export off the hot path. - No payloads are recorded by default.
gen_ai.input.messages,gen_ai.output.messages, raw request/response, and tool args/results all require explicit opt-in viarecord_content=Trueso you don't accidentally ship PII to your backend. See Content & Redaction.
When to use each piece
| You want | Use |
|---|---|
| Trace one local agent run and inspect a JSONL file | Tracer + JsonlSpanExporter |
| Ship to Jaeger / Tempo / Honeycomb / Datadog | Tracer + OTLP exporter |
| Latency + token histograms next to the spans | Meter alongside Tracer |
| Record prompts / model outputs for evaluation | Tracer(record_content=True) |
| Redact PII before it leaves the process | Tracer(redact=…) |
Tag runs with user_id / session_id / A-B arm | tracing_context(tags=…, metadata=…) |
| One-liner cleanup, no try/finally | async with tracer.attached(agent): … |
Forget to call shutdown() and not lose spans | Tracer(atexit_flush=True) (default) |
| Continue a trace from an upstream service | Tracer(resource=…) + W3C traceparent (auto for MCP, manual for HTTP) |
Where to go next
- Getting Started — install the extra and emit your first spans
- OTLP & Backends — point CubePi at Jaeger, Tempo, Honeycomb, …
- Content Recording & Redaction — record prompts and responses safely
- Metrics — histograms via
Meter