Skip to main content
Version: 0.4

The 7 Hooks

Middleware is a class with up to seven optional async methods. Each hook fires at a precise point in the agent loop. Implement only the ones you need — CubePi only wires in the ones you override.

from cubepi import Middleware

class MyMiddleware(Middleware):
async def transform_context(self, messages, *, signal=None):
...

Pass instances to Agent(middleware=[MyMiddleware(), â€Ļ]).

transform_context​

async def transform_context(self, messages: list[Message], *, signal=None) -> list[Message]:
...

Fires before each model call, on the full message list. Use to:

  • Truncate or summarise to fit context windows.
  • Inject system reminders (better: use transform_system_prompt).
  • Add or remove specific messages dynamically.

Return the (possibly new) list. Don't mutate the input — return a new list so other code that holds the original isn't surprised.

Composition: chained — each middleware sees the previous one's output.

convert_to_llm​

async def convert_to_llm(self, messages: list[Message]) -> list[Message]:
...

Fires right before serialisation to the provider. This is the last chance to reshape what the LLM sees. Use for:

  • Stripping tool results to text-only.
  • Replacing image content with text descriptions for non-multimodal providers.
  • Compacting long tool outputs.

Composition: last implementation wins (not chained). Use this when multiple middlewares would conflict and you want a single owner.

transform_system_prompt​

async def transform_system_prompt(self, system_prompt: str, *, signal=None) -> str:
...

Fires before each model call, on the system prompt string. Use to:

  • Inject runtime info (current time, user role).
  • Compose modular system-prompt fragments.
  • A/B test prompt variants.

Composition: chained.

before_tool_call​

async def before_tool_call(self, ctx: BeforeToolCallContext, *, signal=None) -> BeforeToolCallResult | None:
...

Fires per tool call, after argument validation, before tool.execute. The context provides:

  • ctx.assistant_message — the message that initiated the call.
  • ctx.tool_call — the ToolCall block.
  • ctx.args — the validated Pydantic instance.
  • ctx.context — the full AgentContext.

Return BeforeToolCallResult(block=True, reason="â€Ļ") to short-circuit — CubePi feeds the reason back as the tool result with is_error=True. Return None (or no return) to proceed.

Use for: permissions, rate limiting, dry-run modes, sandboxing.

Composition: first block=True short-circuits the chain.

after_tool_call​

async def after_tool_call(self, ctx: AfterToolCallContext, *, signal=None) -> AfterToolCallResult | None:
...

Fires per tool call, after tool.execute returns (or raises). The context adds:

  • ctx.result — the AgentToolResult from execute.
  • ctx.is_error — whether the tool errored.

Return AfterToolCallResult(content=â€Ļ, details=â€Ļ, is_error=â€Ļ, terminate=â€Ļ) to override individual fields of the result (any None field keeps the original). Return None to pass through unchanged.

Use for: redaction, retries, logging, transforming results.

Composition: later overrides earlier (each non-None field in a return value overrides the prior).

should_stop_after_turn​

async def should_stop_after_turn(self, ctx: ShouldStopAfterTurnContext) -> bool:
...

Fires at each turn boundary (after any tool batch). Return True to end the run without another model call.

Use for: max-turn limits, budget caps, application-defined stop conditions.

Composition: any True stops (logical OR across the chain).

after_model_response​

async def after_model_response(
self,
response: AssistantMessage,
ctx: AgentContext,
*,
signal=None,
) -> TurnAction | None:
...

Fires immediately after the assistant message lands, before message_end is emitted and before any tool calls dispatch. The hook returns a TurnAction:

from cubepi.middleware.base import TurnAction
from cubepi.providers.base import UserMessage, TextContent

TurnAction(
response=modified_message, # replace the message; None to keep
inject_messages=[UserMessage(...)], # extra messages to append before next turn
decision="natural", # "natural" | "stop" | "loop_to_model"
)

Three control-flow knobs:

  • decision="natural" (default) — proceed to tool execution / next turn as normal.
  • decision="stop" — end the run after emitting turn_end and agent_end. No tools run, no more model calls.
  • decision="loop_to_model" — skip tool execution and re-invoke the model immediately (use with inject_messages to add context first).

Use for: response moderation, structured-output validation with re-prompts, conditional routing.

Composition: chain — each middleware sees the previous middleware's response; inject_messages concatenate across the chain; the last middleware's decision wins.

Anatomy of a middleware​

A middleware doesn't have to implement every hook. Only override the ones you need; the base class's unimplemented hooks raise NotImplementedError, but compose_middleware skips them automatically.

from cubepi import Middleware

class MaxTurnsMiddleware(Middleware):
def __init__(self, max_turns: int) -> None:
self.max_turns = max_turns
self.turns = 0

async def should_stop_after_turn(self, ctx) -> bool:
self.turns += 1
return self.turns >= self.max_turns


agent = Agent(provider=â€Ļ, model=â€Ļ, middleware=[MaxTurnsMiddleware(5)])

See also​

  • Composition Rules — exact semantics when multiple middlewares define the same hook.
  • Examples — working middleware for rate limiting, logging, retries, sliding-window context.