Version: 0.13 (latest)

Goal

GoalMiddleware keeps an agent running autonomously until a completion condition is met. A separate evaluator model (e.g. Haiku) judges whether the condition has been achieved — the worker model is not grading its own homework.

Inspired by Claude Code's /goal command: set a condition, let the agent work, stop when a second model says the condition is satisfied.

Basic setup

from cubepi import Agent
from cubepi.providers.anthropic import AnthropicProvider
from cubepi.middleware.goal import GoalMiddleware

provider = AnthropicProvider(api_key="...")

goal = GoalMiddleware(
    evaluator=provider.model("claude-haiku-4-5-20251001"),
    max_evaluations=10,
)

agent = Agent(
    model=provider.model("claude-sonnet-4-6"),
    middleware=[goal],
    tools=[...],
)

# /goal prefix activates goal mode
await agent.prompt("/goal all tests in tests/auth pass and ruff check is clean")

# Check outcome
print(agent.state.extra["goal"])
# {"status": "achieved", "condition": "all tests in ...", "evaluations": 2, ...}

How it works

Activation

GoalMiddleware activates when the user message starts with /goal . Everything after the prefix is the completion condition.

# Goal mode — evaluator will judge this condition
await agent.prompt("/goal make the homepage load in under 2 seconds")

# Normal mode — middleware is fully transparent
await agent.prompt("fix the bug in auth.py")

The /goal prefix is stripped before the worker sees the message. The worker receives only the condition text as its work directive.

Evaluation loop

After the worker finishes a complete run (all tool calls exhausted):

The evaluator reads the condition + last 20 messages from the conversation.
It returns {achieved: bool, reason: str} via structured output.
If achieved — the loop ends, status is "achieved".
If not achieved and evaluations remain — feedback is injected ("Goal not yet met: {reason}. Continue working.") and the worker resumes.
If max_evaluations reached — the loop ends, status is "exhausted".

Worker run → on_run_end → evaluator judges
                              ├─ achieved=True  → stop (status: "achieved")
                              ├─ achieved=False → inject feedback, continue
                              └─ max evals hit  → stop (status: "exhausted")

Parameters

Parameter	Type	Default	Description
`evaluator`	`BoundModel`	required	Model that judges the condition
`max_evaluations`	`int`	`10`	Safety cap on evaluator calls

Use a fast, cheap model for the evaluator (e.g. Haiku). It only reads the conversation transcript — it cannot call tools.

Reading the outcome

After agent.prompt() returns, check agent.state.extra["goal"]:

goal_state = agent.state.extra["goal"]

match goal_state["status"]:
    case "achieved":
        print(f"Done in {goal_state['evaluations']} evaluations")
        print(f"Reason: {goal_state['last_reason']}")
    case "exhausted":
        print(f"Gave up after {goal_state['max_evaluations']} evaluations")
        print(f"Last reason: {goal_state['last_reason']}")

Full state shape:

{
    "status": "achieved" | "active" | "exhausted",
    "condition": "all tests pass...",
    "evaluations": 3,
    "max_evaluations": 10,
    "last_reason": "2 tests still failing in test_auth.py",
}

Tracing

GoalMiddleware declares its evaluator via extra_llm_calls(), so the tracing Recorder can subscribe to the evaluator's provider and attribute evaluation spans correctly — they show up as evaluator calls, not worker calls.

Tips

Keep conditions specific and verifiable. "All tests pass" is better than "code works well." The evaluator judges from transcript text, not by running tools.
Use a cheap evaluator. Haiku is usually sufficient — the judgment is binary (achieved/not) with a short reason.
Set max_evaluations conservatively. The default of 10 prevents runaway loops. Increase only when you expect the agent to need many iterations.
Combine with other middleware. GoalMiddleware composes with TodoListMiddleware, compaction, etc. via standard middleware composition.

Basic setup​

How it works​

Activation​

Evaluation loop​

Parameters​

Reading the outcome​

Tracing​

Tips​