Skip to main content
Build custom agents that interact with MCP tools to complete tasks. An agent is essentially a loop that calls your LLM, executes tools based on its decisions, and continues until the task is complete.

How Agents Work

An agent follows this lifecycle: The agent keeps calling your LLM and executing tools until the LLM stops requesting tools, indicating the task is complete.

The Four Required Methods

To create an agent, you implement four methods that bridge your LLM with MCP’s tool system:
from hud.agents import MCPAgent
from hud.types import AgentResponse, MCPToolCall, MCPToolResult

class MyAgent(MCPAgent):
    """Your custom agent implementation."""

    async def get_system_messages(self) -> list[Any]:
        """1. Called ONCE at start - returns your LLM's system prompt."""
        pass

    async def get_response(self, messages: list[Any]) -> AgentResponse:
        """2. Called EACH TURN - sends messages to your LLM, returns its response, optionally adds the assistant message to messages."""
        pass

    async def format_blocks(self, blocks: list[ContentBlock]) -> list[Any]:
        """3. Called at START - converts initial prompt/context to your LLM format."""
        pass

    async def format_tool_results(
        self, tool_calls: list[MCPToolCall],
        tool_results: list[MCPToolResult]
    ) -> list[Any]:
        """4. Called AFTER TOOLS - converts tool results to your LLM format."""
        pass

Understanding When Each Method is Called

The agent loop calls your methods in this sequence:
  1. get_system_messages() - Once at start
  2. format_blocks() - Converts initial task prompt
  3. get_response() - Gets LLM decision, adds assistant message to messages
  4. format_tool_results() - After each tool execution
  5. Back to step 3 until done

What MCPAgent Does For You

The Agent Loop

The base MCPAgent class handles the entire execution loop. When you call agent.run(task):
  1. Initialization Phase
    • Connects to MCP servers (auto-creates client from task.mcp_config if needed)
    • Discovers available tools from all connected servers
    • Applies tool filtering (allowed/disallowed lists)
    • Identifies lifecycle tools (setup, evaluate, response)
  2. Setup Phase (if task.setup_tool provided)
    • Executes setup tools (e.g., navigate to website, initialize environment)
    • Optionally appends setup output to initial context (controlled by append_setup_output)
    • Can include initial screenshots (controlled by initial_screenshot)
  3. Main Execution Loop
    while not done and step < max_steps:
        # Your get_response() is called here
        response = await agent.get_response(messages)
    
        if response.tool_calls:
            # MCPAgent executes tools for you
            results = await agent.call_tools(response.tool_calls)
    
            # Your format_tool_results() is called here
            messages.extend(await agent.format_tool_results(tool_calls, results))
        else:
            done = True
    
  4. Evaluation Phase (if task.evaluate_tool provided)
    • Runs evaluation tools to calculate reward
    • Extracts reward from result (looks for “reward”, “grade”, “score” keys)
    • Returns Trace object with full execution history

Tool Management

Tool Discovery & Filtering
agent = ClaudeAgent(
    allowed_tools=["anthropic_computer"],  # Only these tools
    disallowed_tools=["openai_computer"],  # Never these tools
)
  • Available Tools: Retrieved via self.get_available_tools() - already filtered
  • Lifecycle Tools: Automatically detected and hidden from your LLM
  • Response Tools: Auto-detected (tools with “response” in name) for task completion

Client Management

MCPAgent handles complex client lifecycle:
# Option 1: Provide your own client
from hud.clients import MCPClient
client = MCPClient(mcp_config={...})
agent = MyAgent(mcp_client=client)

# Option 2: Auto-create from task
task = Task(mcp_config={...})
agent = MyAgent()  # No client needed
await agent.run(task)  # Client created automatically
Auto-cleanup: Clients created automatically are properly shut down after execution.

Error Handling

MCPAgent provides robust error handling:
  • Connection Errors: Helpful messages about MCP server availability
  • Tool Errors: Captured and returned as MCPToolResult with isError=True
  • Timeout Handling: Graceful shutdown on tool execution timeouts
  • Trace Always Returns: Even on errors, you get a Trace object with details

Message Accumulation

Messages build up over the conversation:
[System] → [User Prompt] → [LLM Response] → [Tool Results] → [LLM Response] → ...
Your get_response() receives the full conversation history each time, allowing your LLM to maintain context.

Advanced Features

Response Agent Integration
from hud.agents.misc import ResponseAgent

agent = MyAgent(
    response_agent=ResponseAgent()  # Auto-decides when to stop/continue
)
The ResponseAgent can analyze ambiguous LLM responses like “Should I submit?” and decide whether to continue. Telemetry & Tracing
agent = MyAgent(
    auto_trace=True,  # Automatic span creation
    verbose=True  # Detailed logging
)
System Prompt Augmentation
task = Task(
    system_prompt="Additional instructions...",  # Appended to agent's system prompt
    ...
)

Testing Your Agent

Test your agent on a simple task:
import asyncio
import hud
import os
from hud.datasets import Task

async def test_agent():
    with hud.trace("test-custom-agent"):
        task = Task(
            prompt="Navigate to example.com",
            mcp_config={
                "hud": {
                    "url": "https://mcp.hud.so/v3/mcp",
                    "headers": {
                        "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                        "Mcp-Image": "hudpython/hud-remote-browser:latest"
                    }
                }
            },
            setup_tool={
                "name": "setup",
                "arguments": {
                    "name": "navigate",
                    "arguments": {"url": "https://example.com"}
                }
            },
            evaluate_tool={
                "name": "evaluate",
                "arguments": {
                    "name": "url_match",
                    "arguments": {"pattern": "example.com"}
                }
            }
        )
        
        # Use your custom agent
        agent = MyAgent()
        result = await agent.run(task)
        print(f"Reward: {result.reward}")

asyncio.run(test_agent())

Built-in Agents

HUD provides built-in agents for common LLM providers:
from hud.agents import ClaudeAgent, OperatorAgent

# Claude (Anthropic)
claude_agent = ClaudeAgent(
    model="claude-sonnet-4-20250514",
)

# Operator (OpenAI-based)
operator_agent = OperatorAgent()
Always test your agent with the actual MCP servers you’ll use in production.

Next Steps

See Also

I