Recording Prompts, Responses, and Tool Payloads
By default CubePi's tracing emits structural attributes only â operation names, models, token counts, finish reasons, durations. No prompt content, no model output, no tool arguments or results leave the process. This is deliberate: many agent setups handle PII, customer data, or trade-secret prompts that don't belong in a third-party observability backend.
When you do want content captured â for offline evaluation, debugging a
flaky tool call, or building a labelled dataset â opt in explicitly with
record_content=True, and combine it with a redact callback to strip the
sensitive parts before they leave the process.
Turning content recording onâ
tracer = Tracer(
service_name="my-bot",
agent_name="assistant",
record_content=True, # â opt-in
exporters=[JsonlSpanExporter(directory="./cubepi-traces")],
)
With record_content=True, each span layer carries the relevant content
attributes per the OTel GenAI semconv:
| Span | Content attributes added |
|---|---|
invoke_agent | gen_ai.system_instructions, gen_ai.input.messages, gen_ai.output.messages |
cubepi.turn | gen_ai.input.messages (per-turn slice), gen_ai.output.messages (per-turn slice) |
chat <model> | gen_ai.system_instructions, gen_ai.input.messages, gen_ai.tool.definitions, cubepi.llm.raw_request, cubepi.llm.raw_response |
execute_tool <tool_name> | gen_ai.tool.call.arguments, gen_ai.tool.call.result |
The chat span's gen_ai.input.messages contains the full chronological
context the provider request actually carried â including prior assistant
turns and tool results â not just the new user prompt. This matters for
multi-turn tool-using runs: trace consumers can reconstruct exactly what
the model saw at each call.
Redacting before exportâ
redact is a (key, value) -> value | None callback invoked at every
content-attribute set site. Return:
- The original value unchanged â keep as-is
- A modified value of the same shape â substitute
Noneâ drop the attribute entirely
def redact(key: str, value):
# Strip secrets from prompts before they leave the process.
if key in ("gen_ai.input.messages", "gen_ai.output.messages"):
return _scrub_messages(value)
# Don't ship raw bodies at all in prod â keep only the normalised shape.
if key in ("cubepi.llm.raw_request", "cubepi.llm.raw_response"):
return None
return value
tracer = Tracer(
service_name="my-bot",
record_content=True,
redact=redact,
exporters=[âĻ],
)
redact is the single chokepoint for content â the recorder calls it once
per attribute before serializing into the OTel attribute, so anything the
function returns is what hits the wire. Exceptions inside redact are
swallowed (the attribute is dropped in that case), so a buggy redactor
fails closed rather than leaking.
Common patternsâ
Drop everything but per-message length so dashboards still work without shipping content:
def redact(key, value):
if key in ("gen_ai.input.messages", "gen_ai.output.messages"):
return [{"role": m["role"], "parts": [{"type": "text", "content": "<redacted>",
"length": sum(len(p.get("content", "")) for p in m["parts"])}]}
for m in value]
return value
Tag-based selective recording â strip everything unless a thread is opted in:
import contextvars
RECORD = contextvars.ContextVar("trace.record_content", default=False)
def redact(key, value):
return value if RECORD.get() else None
then RECORD.set(True) for the runs you want captured.
Size budgetsâ
OTel attribute values are JSON-serialized inside the recorder. Most backends
truncate or reject attributes over a few hundred KB. If you're recording
the raw provider response on every chat span, multi-turn agentic runs can
get large fast. Drop or summarise via redact for any field over your
budget.
Stream-level recordingâ
record_content captures the final assembled request and response on each chat
span, but not the individual streaming chunks. For post-mortem debugging of
streaming failures â empty tool call arguments, duplicate events, truncated
output â enable record_stream to write a chunk-by-chunk event log alongside the
main trace:
tracer = Tracer(
record_content=True, # needed for trace convert
record_stream=True, # â per-chunk event log
stream_dir="./cubepi-traces", # where to write <run_id>.stream.jsonl
exporters=[JsonlSpanExporter(directory="./cubepi-traces")],
)
record_stream writes <stream_dir>/<run_id>.stream.jsonl (one JSON line per
StreamEvent). Every line carries t (elapsed seconds from run start) and
type. Tool-call lines also include ci (content index), id, name, delta
sizes, and argument previews:
{"t": 5.873, "type": "toolcall_start", "ci": 1, "id": "toolu_...", "name": "show_widget"}
{"t": 5.875, "type": "toolcall_delta", "ci": 1, "chars": 11, "accumulated": 11, "preview": "{\"title\": \""}
{"t": 33.177, "type": "toolcall_end", "ci": 1, "id": "toolu_...", "args_chars": 7465, "args_preview": "{\"title\": \"CubePi..."}
This makes it straightforward to confirm whether argument chunks ever arrived, or
whether the same event fired twice (e.g. a provider that sends finish_reason
twice produces two toolcall_end lines for the same ci).
record_stream is independent of record_content â turn it on only in debugging
sessions. Files can grow large for long-running, tool-heavy agents.
Auditing what's recordedâ
The recorder always sets service.name, gen_ai.agent.name, and
cubepi.run_id on every span â regardless of record_content. Use these
to filter the trace backend to a single run and visually confirm what
landed.
For deeper audits, JsonlSpanExporter writes one line per span, so you can
grep / jq the local files before pointing the same exporter at a remote
backend:
jq -r 'select(.attributes["gen_ai.input.messages"]) | .attributes["gen_ai.input.messages"]' \
cubepi-traces/2026-05-19/*.jsonl