Metrics with Meter
Spans tell you the shape of one run; histograms tell you the shape of the
fleet. cubepi.tracing.Meter mirrors Tracer and emits the OTel GenAI
metric set so dashboards work out of the box.
What it emits
| Instrument | Description |
|---|---|
gen_ai.client.operation.duration | Histogram (seconds) — recorded for chat, execute_tool, and invoke_agent on close |
gen_ai.client.operation.time_to_first_chunk | Histogram (seconds) — recorded for chat when at least one content chunk arrived |
gen_ai.client.token.usage | Histogram ({token}) — one observation per gen_ai.token.type (input, output) per chat response |
Each point carries the operation, provider, and request model attributes so failed / cancelled requests (where no response body / response model landed) are still groupable by what was asked for:
gen_ai.operation.name—chat/execute_tool/invoke_agentgen_ai.provider.name—anthropic,openai,openai_responses, …gen_ai.request.model— e.g.claude-sonnet-4-5-20250929gen_ai.response.model— e.g.claude-sonnet-4-5-20250929(success only)gen_ai.token.type—inputoroutput(token usage only)
Attaching a Meter
The idiomatic RAII form — async with everything, no manual cleanup:
from opentelemetry.exporter.otlp.proto.http.metric_exporter import (
OTLPMetricExporter,
)
from cubepi.tracing import Tracer, Meter
from cubepi.tracing.exporters import JsonlSpanExporter
async with (
Tracer(
service_name="my-bot",
agent_name="assistant",
exporters=[JsonlSpanExporter(directory="./cubepi-traces")],
) as tracer,
Meter(
resource=tracer.resource, # share Resource so service.* matches spans
exporters=[
OTLPMetricExporter(endpoint="http://otel-collector:4318/v1/metrics"),
],
) as meter,
tracer.attached(agent),
meter.attached(agent),
):
await agent.prompt("...")
# Exit order auto: detach both (closes any cancelled-run spans, flushes
# the trace pipeline) → shutdown both (flush + close exporters).
Meter.attach() is independent from Tracer.attach(). You can run
either on its own; the recommended setup is both, sharing one
Resource so the backend treats them as the same service.
If you need the explicit, non-RAII form (e.g. attaching agents dynamically over the lifetime of a long-running server):
tracer_detach = tracer.attach(agent)
meter_detach = meter.attach(agent)
try:
...
finally:
tracer_detach() # closes any cancelled-run spans
meter_detach() # unsubscribes the meter's listeners
await tracer.shutdown()
await meter.shutdown()
Bucket boundaries
The duration histograms ship with the OTel GenAI semconv's recommended boundaries (in seconds):
0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1, 2, 5, 10, 30, 60, 120, 300
OTel exposes these as the _advisory boundaries; backends are free to use
them as-is or override.
Concurrent agents on one Meter
Like Tracer, one Meter instance is safe to attach to multiple agents
in the same process. Each attach() call gets its own internal
_MeterState holding the open-ns timestamps and attribute dicts, so
overlapping runs from two agents never share or overwrite each other's
metric state.
meter = Meter(resource=tracer.resource, exporters=[exporter])
meter.attach(agent_a)
meter.attach(agent_b)
Both agents emit independent duration / token / TTFC observations,
filterable by gen_ai.agent.name (set at the Resource level when
Tracer(agent_name=…) is used) or by gen_ai.request.model.
Shutting down
The RAII form (async with … as tracer, … as meter, tracer.attached(agent), meter.attached(agent): from above) handles the shutdown ordering for you:
detach inner first → outer Tracer/Meter __aexit__ runs shutdown().
For the manual pattern, order matters — tracer_detach() must run before
tracer.shutdown() so any spans an in-flight cancellation left open get
closed and exported in the same flush:
finally:
tracer_detach()
meter_detach()
await tracer.shutdown()
await meter.shutdown()
Meter.shutdown() awaits a flush of the metric reader, then closes it.
PeriodicExportingMetricReader exports on a fixed interval (60 s by
default) — shutdown is the only way to guarantee the final window lands
before the process exits.
Querying example (Honeycomb)
p95 chat latency by provider over the last hour:
VISUALIZE P95(duration_s)
GROUP BY gen_ai.provider.name
WHERE gen_ai.operation.name = "chat"
TIME last 1 hour
Token usage by model:
VISUALIZE SUM(token_count)
GROUP BY gen_ai.request.model, gen_ai.token.type
WHERE gen_ai.operation.name = "chat"
Substitute your backend's query DSL — same attributes, same aggregations.