The 7 Hooks
Middleware is a class with up to seven optional async methods. Each
hook fires at a precise point in the agent loop. Implement only the
ones you need â CubePi only wires in the ones you override.
from cubepi import Middleware
class MyMiddleware(Middleware):
async def transform_context(self, messages, *, signal=None):
...
Pass instances to Agent(middleware=[MyMiddleware(), âĻ]).
transform_contextâ
async def transform_context(self, messages: list[Message], *, signal=None) -> list[Message]:
...
Fires before each model call, on the full message list. Use to:
- Truncate or summarise to fit context windows.
- Inject system reminders (better: use
transform_system_prompt). - Add or remove specific messages dynamically.
Return the (possibly new) list. Don't mutate the input â return a new list so other code that holds the original isn't surprised.
Composition: chained â each middleware sees the previous one's output.
convert_to_llmâ
async def convert_to_llm(self, messages: list[Message]) -> list[Message]:
...
Fires right before serialisation to the provider. This is the last chance to reshape what the LLM sees. Use for:
- Stripping tool results to text-only.
- Replacing image content with text descriptions for non-multimodal providers.
- Compacting long tool outputs.
Composition: last implementation wins (not chained). Use this when multiple middlewares would conflict and you want a single owner.
transform_system_promptâ
async def transform_system_prompt(self, system_prompt: str, *, signal=None) -> str:
...
Fires before each model call, on the system prompt string. Use to:
- Inject runtime info (current time, user role).
- Compose modular system-prompt fragments.
- A/B test prompt variants.
Composition: chained.
before_tool_callâ
async def before_tool_call(self, ctx: BeforeToolCallContext, *, signal=None) -> BeforeToolCallResult | None:
...
Fires per tool call, after argument validation, before
tool.execute. The context provides:
ctx.assistant_messageâ the message that initiated the call.ctx.tool_callâ theToolCallblock.ctx.argsâ the validated Pydantic instance.ctx.contextâ the fullAgentContext.
Return BeforeToolCallResult(block=True, reason="âĻ") to short-circuit
â CubePi feeds the reason back as the tool result with
is_error=True. Return None (or no return) to proceed.
Use for: permissions, rate limiting, dry-run modes, sandboxing, human-in-the-loop confirmation (see HITL guide).
Composition: first block=True short-circuits the chain.
after_tool_callâ
async def after_tool_call(self, ctx: AfterToolCallContext, *, signal=None) -> AfterToolCallResult | None:
...
Fires per tool call, after tool.execute returns (or raises).
The context adds:
ctx.resultâ theAgentToolResultfrom execute.ctx.is_errorâ whether the tool errored.
Return AfterToolCallResult(content=âĻ, details=âĻ, is_error=âĻ, terminate=âĻ)
to override individual fields of the result (any None field keeps
the original). Return None to pass through unchanged.
Use for: redaction, retries, logging, transforming results.
Composition: later overrides earlier (each non-None field in a
return value overrides the prior).
should_stop_after_turnâ
async def should_stop_after_turn(self, ctx: ShouldStopAfterTurnContext) -> bool:
...
Fires at each turn boundary (after any tool batch). Return True
to end the run without another model call.
Use for: max-turn limits, budget caps, application-defined stop conditions.
Composition: any True stops (logical OR across the chain).
after_model_responseâ
async def after_model_response(
self,
response: AssistantMessage,
ctx: AgentContext,
*,
signal=None,
) -> TurnAction | None:
...
Fires immediately after the assistant message lands, before
message_end is emitted and before any tool calls dispatch. The
hook returns a TurnAction:
from cubepi.middleware.base import TurnAction
from cubepi.providers.base import UserMessage, TextContent
TurnAction(
response=modified_message, # replace the message; None to keep
inject_messages=[UserMessage(...)], # extra messages to append before next turn
decision="natural", # "natural" | "stop" | "loop_to_model"
)
Three control-flow knobs:
decision="natural"(default) â proceed to tool execution / next turn as normal.decision="stop"â end the run after emittingturn_endandagent_end. No tools run, no more model calls.decision="loop_to_model"â skip tool execution and re-invoke the model immediately (use withinject_messagesto add context first).
Use for: response moderation, structured-output validation with re-prompts, conditional routing.
Composition: chain â each middleware sees the previous
middleware's response; inject_messages concatenate across the
chain; the last middleware's decision wins.
Anatomy of a middlewareâ
A middleware doesn't have to implement every hook. Only override the
ones you need; the base class's unimplemented hooks raise
NotImplementedError, but compose_middleware skips them
automatically.
from cubepi import Middleware
class MaxTurnsMiddleware(Middleware):
def __init__(self, max_turns: int) -> None:
self.max_turns = max_turns
self.turns = 0
async def should_stop_after_turn(self, ctx) -> bool:
self.turns += 1
return self.turns >= self.max_turns
agent = Agent(provider=âĻ, model=âĻ, middleware=[MaxTurnsMiddleware(5)])
See alsoâ
- Composition Rules â exact semantics when multiple middlewares define the same hook.
- Examples â working middleware for rate limiting, logging, retries, sliding-window context.