Skip to main content
Version: 0.10 (latest)

Deferred Tool Groups

When an agent connects to many MCP servers, their combined tool schemas can consume thousands of tokens of context on every turn — even if the model only needs one or two groups for the current task. DeferredToolGroup solves this by replacing full schemas with a compact catalog, letting the model expand groups on demand.

How it works

  1. At construction time, the agent's system prompt includes a short catalog — one line per group with a description and tool list.
  2. The model sees a built-in load_tools tool it can call to load a group (or specific tools within a group).
  3. On expansion, the loader runs once, the tools are injected into the live tool set, and their schemas are appended to the system prompt.
# Deferred tool groups

These tool groups are available but not yet loaded. Call `load_tools(group_id)`
to load a group's tools for the rest of this conversation.

- `mcp:github` — GitHub: Issues, PRs, repos, code search (4 tools)
create_issue, search_repos, create_pr, list_comments
- `mcp:linear` — Linear: Project management and issue tracking (6 tools)
create_issue, update_issue, list_projects, ...

Basic setup

Pass deferred_tool_groups to Agent. The middleware is created automatically — no manual wiring needed:

from cubepi import Agent
from cubepi.deferred import DeferredToolGroup

# load_github_tools / load_linear_tools are zero-arg async callables
# returning list[AgentTool]. See "Writing a loader" below for the two
# common shapes (MCP-backed and hand-written @tool functions).

github_group = DeferredToolGroup(
group_id="mcp:github",
display_name="GitHub",
description="Issues, PRs, repos, code search",
tool_names=["create_issue", "search_repos", "create_pr", "list_comments"],
loader=load_github_tools,
)

linear_group = DeferredToolGroup(
group_id="mcp:linear",
display_name="Linear",
description="Project management and issue tracking",
tool_names=["create_issue", "update_issue", "list_projects"],
loader=load_linear_tools,
)

agent = Agent(
model=provider.model("claude-sonnet-4-6"),
tools=[search_tool, calculator], # always-available tools
deferred_tool_groups=[github_group, linear_group],
)

DeferredToolGroup fields

FieldTypeDescription
group_idstrUnique identifier the model uses in load_tools calls (e.g. "mcp:github")
display_namestrHuman-readable label shown in the catalog
descriptionstrOne-line summary of the group's capabilities
tool_nameslist[str]Tool names shown in the catalog. Must match the AgentTool.name of each tool the loader returns — selective expansion (load_tools(group_id, tool_names=[…])) matches on this.
loaderasync () -> list[AgentTool]Callback that returns the full tool set for this group

Writing a loader

The loader is a zero-argument async callable that returns list[AgentTool]. CubePi only cares about its return type — where the AgentTool objects come from is up to you. Two common shapes:

From an MCP server. load_mcp_tools_stdio / load_mcp_tools_http return an MCPDiscoveryResult whose .tools is the list[AgentTool] you want. Wrap it:

from cubepi.deferred import DeferredToolGroup
from cubepi.mcp import load_mcp_tools_stdio

async def load_github_tools():
result = await load_mcp_tools_stdio(
command="npx",
args=["-y", "@modelcontextprotocol/server-github"],
env={"GITHUB_TOKEN": "ghp_…"},
)
return result.tools # list[AgentTool]

github_group = DeferredToolGroup(
group_id="mcp:github",
display_name="GitHub",
description="Issues, PRs, repos, code search",
tool_names=["create_issue", "search_repos", "create_pr"],
loader=load_github_tools,
)

The names in tool_names must match the MCP server's tool names — those become AgentTool.name after discovery. If the catalog lists create_issue but the server publishes it as github_create_issue, selective expansion misses.

From hand-written @tool functions. Any function decorated with @tool produces an AgentTool (its .name defaults to the function name, overridable via @tool(name="…")). A loader for hand-written tools is just async lambda over a list:

from cubepi import tool
from cubepi.deferred import DeferredToolGroup

@tool
async def create_issue(*, repo: str, title: str, body: str) -> str:
"Open a GitHub issue."
...

@tool
async def search_repos(*, query: str) -> str:
"Search public repos."
...

async def load_github_tools():
return [create_issue, search_repos] # already AgentTools

github_group = DeferredToolGroup(
group_id="mcp:github",
display_name="GitHub",
description="Issues, PRs, repos, code search",
tool_names=["create_issue", "search_repos"],
loader=load_github_tools,
)

You can mix the two — return MCP tools and hand-written tools in the same list — as long as every name in tool_names matches an AgentTool.name in the returned list. If the loader raises, the error is reported to the model as a tool error and the group stays unexpanded.

The load_tools tool

The model calls load_tools to load a group's tools. Two modes:

# Expand everything in the group
load_tools(group_id="mcp:github")

# Expand specific tools only
load_tools(group_id="mcp:github", tool_names=["create_issue", "search_repos"])

The tool returns a structured result:

{
"group_id": "mcp:github",
"expanded": true,
"tool_names": ["create_issue", "search_repos", "create_pr", "list_comments"],
"remaining": 0
}

After expansion, the tools are immediately available for the model to call in the same turn (via the after_tool_call hook).

Selective expansion

The model can expand a group incrementally — requesting one or two tools now and more later:

load_tools(group_id="mcp:github", tool_names=["create_issue"])
# → remaining: 3

# later...
load_tools(group_id="mcp:github", tool_names=["search_repos"])
# → remaining: 2

Already-expanded tools are idempotent — re-requesting them is a no-op.

Loader caching

The loader callback is invoked exactly once per group per run. The first load_tools call triggers it; subsequent selective expansions filter from the cached result. If the loader fails, the error is returned to the model and the group remains unexpanded.

Prompt-cache stability

The system prompt is designed for prompt-cache prefix stability:

  • Catalog is sorted by group_id alphabetically — input order doesn't matter, the rendered text is byte-stable.
  • Expanded schemas are appended in expansion order (the order the model called load_tools), never reordered. Each new expansion appends to the end, preserving the existing prefix.

This means the LLM API's prompt cache remains valid across turns: the system prompt only grows, and only at the end.

Expansion state

The middleware tracks which groups are expanded in ctx.extra:

ctx.extra["expanded_groups"] = {
"mcp:github": None, # fully expanded (None = all tools)
"mcp:linear": ["create_issue"], # partially expanded
# mcp:slack not present = unexpanded
}

This state survives checkpointing and can be used for cross-run replay (see below).

Cross-run replay

When resuming a conversation from a previous run, you need to restore the expansion state so the model has the same tools available. prepare_resumed_state handles this:

from cubepi.deferred import DeferredToolsMiddleware

# saved_extra is the persisted ctx.extra from the previous run
resumed = await DeferredToolsMiddleware.prepare_resumed_state(
groups=all_groups,
expanded=saved_extra["expanded_groups"],
)

agent = Agent(
model=model,
tools=[*builtin_tools, *resumed.pre_loaded_tools],
deferred_tool_groups=resumed.remaining_groups,
)

prepare_resumed_state returns a ResumedState with:

FieldDescription
pre_loaded_toolsTools from previously-expanded groups, ready to use
remaining_groupsGroups that were never expanded or only partially expanded
expanded_schemasSchema data for the system prompt (pass to resumed_schemas for advanced use)
loader_cachePre-loaded tool cache (pass to resumed_loader_cache to avoid redundant loader calls)

Fully expanded groups are loaded and removed from the deferred set. Partially expanded groups load the selected tools but stay deferrable (the model can still expand the rest).

Restoring schema text

The Agent(deferred_tool_groups=...) shorthand handles the common case. For full prompt-cache continuity — where the resumed run's system prompt must match the previous run's final state byte-for-byte — construct the middleware directly with resumed_schemas:

mw = DeferredToolsMiddleware(
groups=resumed.remaining_groups,
extra_ref=lambda: agent_extra,
resumed_schemas=resumed.expanded_schemas,
resumed_loader_cache=resumed.loader_cache,
)

agent = Agent(
model=model,
tools=[*builtin_tools, *resumed.pre_loaded_tools],
middleware=[mw],
)

Advanced: constructing the middleware directly

For full control over the catalog header, cross-run schema seeding, or other middleware options, construct DeferredToolsMiddleware yourself:

from cubepi.deferred import DeferredToolsMiddleware

mw = DeferredToolsMiddleware(
groups=[github_group, linear_group],
extra_ref=lambda: agent_extra,
catalog_header="# Available integrations\n\nExpand with load_tools().",
resumed_schemas=None, # or pass schemas from a previous run
)

agent = Agent(
model=model,
tools=[search_tool],
middleware=[mw],
)

Constructor parameters

ParameterTypeDefaultDescription
groupslist[DeferredToolGroup]requiredGroups to defer
extra_ref() -> dictrequiredReturns the live ctx.extra dict
catalog_headerstr(built-in)Header text for the catalog section
resumed_schemaslist[tuple[str, list[dict]]] | NoneNoneSchema data to seed from a previous run
resumed_loader_cachedict[str, list[AgentTool]] | NoneNonePre-loaded tool cache from a previous run (avoids re-calling loaders on resume)
on_tools_expanded(list[AgentTool]) -> None | NoneNoneCalled after new tools are expanded (used internally for cross-turn persistence)

When using the Agent(deferred_tool_groups=...) shorthand, extra_ref is automatically bound to self._extra.

When to use it

Good fit:

  • Agent has access to 5+ MCP servers but typically uses 1–2 per conversation.
  • Tool schemas are large (many parameters, long descriptions).
  • You want to keep prompt-cache hit rates high across turns.

Skip it when:

  • The agent has only a few tools — the overhead of the catalog and load_tools call isn't worth it.
  • All tools are needed on every turn — deferring just adds a round trip.
  • Tool schemas are small — the context savings are minimal.

See also

  • Loading MCP Tools — how to get AgentTool lists from MCP servers.
  • The 8 Hooks — the middleware hooks that power deferred tools (transform_system_prompt, after_tool_call).
  • Composition — how middleware composes when stacked with other middleware.