Anthropic Provider
AnthropicProvider wraps the official anthropic SDK against the
Messages API. It supports streaming, extended thinking, prompt
caching, and tool use.
Constructionâ
from cubepi.providers.anthropic import AnthropicProvider
provider = AnthropicProvider(
api_key="sk-ant-âĻ", # or read from os.environ["ANTHROPIC_API_KEY"]
base_url=None, # set to point at a proxy / Bedrock-compatible endpoint
cache_retention="short", # "short" (5 min, default) | "long" (1 h) | "none"
)
api_key=None lets the underlying SDK read ANTHROPIC_API_KEY from
the environment.
The Modelâ
from cubepi import Model
model = Model(
id="claude-sonnet-4-5-20250929",
provider="anthropic",
reasoning=True, # enables thinking levels (see below)
max_tokens=8192, # response cap
context_window=200_000, # hard model limit
temperature=0.7,
)
The id is the model name exactly as you'd pass it to the SDK. The
provider string is a free-form label used by CubePi internals â keep
it stable across your codebase but it doesn't have to match
"anthropic" literally.
Extended thinkingâ
CubePi maps a ThinkingLevel enum onto Anthropic's budget_tokens:
| Level | Default budget |
|---|---|
"off" | thinking disabled |
"minimal" | 1024 |
"low" | 2048 |
"medium" | 8192 |
"high" | 16384 |
"xhigh" | clamped to "high" |
Set it per-agent:
agent = Agent(provider=provider, model=model, thinking="medium")
To change the per-level budgets, supply a
CapabilityDescriptor with a
reasoning_level of kind="int_budget" at construction â its
level_budgets map is the single source of truth for budget values.
from cubepi import CapabilityDescriptor, ReasoningLevelSpec
from cubepi.providers.anthropic import AnthropicProvider
provider = AnthropicProvider(
api_key="sk-ant-âĻ",
capability=CapabilityDescriptor(
reasoning_off_payload={"thinking": {"type": "disabled"}},
reasoning_on_payload={"thinking": {"type": "enabled"}},
reasoning_level=ReasoningLevelSpec(
path="thinking.budget_tokens",
kind="int_budget",
level_budgets={"off": 0, "minimal": 1024, "low": 4096,
"medium": 12288, "high": 16384, "xhigh": 16384},
),
),
)
AnthropicProvider no longer honors
StreamOptions(thinking_budgets=âĻ) â the capability descriptor's
level_budgets is now the only way to override budgets.
When thinking is on, CubePi omits temperature because the
Anthropic API rejects non-default temperatures alongside extended
thinking (feature compatibility).
Set Model.temperature to the value you want when thinking is off;
CubePi handles the rest.
Thinking content streams as thinking_start / thinking_delta /
thinking_end events and ends up in AssistantMessage.content as
ThinkingContent blocks, preserved on subsequent turns so the model
keeps continuity.
Prompt cachingâ
By default the provider marks three cache breakpoints on each request:
- The system prompt (most stable).
- The last tool definition (changes rarely).
- The last message (the cache moves forward each turn so prior history stays warm).
Cache retention is "short" (5 minutes, free). Bump to "long" if
your turns are slower than that:
AnthropicProvider(api_key=âĻ, cache_retention="long") # 1-hour TTL
AnthropicProvider(api_key=âĻ, cache_retention="none") # disable entirely
The Usage object on each AssistantMessage reports
cache_read_tokens and cache_write_tokens so you can see your hit
rate.
For custom cache strategies (a different breakpoint policy), implement
the CacheMarkerPolicy Protocol and pass cache_policy=âĻ. The
default policy lives at cubepi.providers.anthropic.DefaultCacheMarkerPolicy.
Custom payloads with on_payloadâ
on_payload lets you inspect or replace the request dict right before
it's sent:
async def my_payload(payload, model):
payload.setdefault("metadata", {})["user_id"] = "u-42"
return payload # return None or no return to keep the original
agent = Agent(provider=provider, model=model, on_payload=my_payload)
Use this for: adding metadata.user_id (for billing), forcing
beta-header flags, or tracking the exact payloads you sent for a
debug pane.
Custom response handling with on_responseâ
on_response fires after the HTTP response is received (status,
headers), before streaming begins:
async def my_response(resp, model):
if resp.status >= 400:
logger.warning("bad status %s", resp.status)
rate = resp.headers.get("anthropic-ratelimit-requests-remaining")
if rate is not None:
metrics.gauge("rate_remaining", int(rate))
agent = Agent(provider=provider, model=model, on_response=my_response)
Both callbacks may be sync or async.
Pointing at Bedrock / Vertex / proxiesâ
The Anthropic SDK accepts a base_url; CubePi forwards it:
provider = AnthropicProvider(
api_key="âĻ",
base_url="https://my-litellm.internal/v1",
)
For Bedrock specifically, use the anthropic-bedrock adapter directly
and inject it via a custom provider.
Common pitfallsâ
temperatureignored â Expected. CubePi drops it when thinking is on; that's an API constraint, not a bug.xhighlooks the same ashighâ Anthropic doesn't expose a budget tier abovehigh, so CubePi clampsxhighdown. The token budget is the same.- Cache misses you didn't expect â Caches are keyed by content + ttl. Changing the system prompt invalidates everything; changing the tool list invalidates from the tools onward. Make those two stable across turns to maximise hits.
anthropic.RateLimitErrorâ Propagates as a stream error event with the SDK'sstr(exc). Catch inagent_endand decide whether to retry.
See alsoâ
- OpenAI Provider â same protocol, different shape.
- Capabilities & Preset Catalog â tune reasoning budgets and temperature handling as data.
- Custom Provider â wrap a non-built-in API.
- Recipes â Multi-Provider Failover â fall back to OpenAI when Anthropic is down.