Skip to main content
The HUD SDK provides a base MCPAgent class and several pre-built agent implementations for interacting with MCP environments.

Base Class

MCPAgent

from hud.agents import MCPAgent
Abstract base class for all MCP-enabled agents. Handles the agent loop, MCP client lifecycle, tool discovery/filtering (including lifecycle tools like setup, evaluate, response), and standardized telemetry/logging. Constructor Parameters:
ParameterTypeDescriptionDefault
mcp_clientAgentMCPClientMCP client for server connectionsNone
allowed_toolslist[str]List of tool names to allowNone (all)
disallowed_toolslist[str]List of tool names to disallow[]
system_promptstrSystem prompt to seed the conversationDefault prompt
append_setup_outputboolAppend setup tool output to initial contextTrue
initial_screenshotboolInclude an initial screenshot when supportedTrue
model_namestrModel label for telemetry/logging"mcp-agent"
response_agentResponseAgentOptional auto-continue/stop helperNone
auto_traceboolEnable automatic tracing spansTrue
verboseboolVerbose console logs for developmentFalse
Key Methods:
async def initialize(task: str | Task | None = None) -> None
    """Initialize agent with task-specific configuration."""

async def run(prompt_or_task: str | Task | dict[str, Any], max_steps: int = 10) -> Trace
    """Run agent with prompt or task. Returns Trace with results."""

async def call_tools(tool_call: MCPToolCall | list[MCPToolCall]) -> list[MCPToolResult]
    """Execute tool calls through MCP client."""

def get_available_tools() -> list[types.Tool]
    """Get filtered list of available tools (excludes lifecycle)."""

def get_tool_schemas() -> list[dict]
    """Get tool schemas formatted for the model."""
Abstract Methods (must implement):
async def get_system_messages() -> list[Any]
    """Get system prompt formatted for the model."""

async def get_response(messages: list[Any]) -> AgentResponse
    """Get model response including tool calls."""

async def format_blocks(blocks: list[ContentBlock]) -> list[Any]
    """Format content blocks into model messages."""

async def format_tool_results(tool_calls: list[MCPToolCall], 
                            tool_results: list[MCPToolResult]) -> list[Any]
    """Format tool results for the model."""
Class Variables:
  • metadata: dict[str, Any] | None - Injected into MCP initialize requests
  • required_tools: list[str] - Tools that must be present or initialization fails
Auto-Client Creation: If no mcp_client is provided but a Task with mcp_config is passed to run(), an MCPClient is automatically created and cleaned up. Lifecycle & Filtering:
  • Lifecycle tools (e.g., setup, evaluate, and auto-detected response) are hidden from the model but available to the framework
  • Use allowed_tools / disallowed_tools to control the tool surface visible to your model

Pre-built Agents

ClaudeAgent

from hud.agents import ClaudeAgent
Claude-specific implementation using Anthropic’s API. Constructor Parameters:
ParameterTypeDescriptionDefault
model_clientAsyncAnthropicAnthropic clientAuto-created
modelstrClaude model to use"claude-sonnet-4-20250514"
max_tokensintMaximum response tokens4096
use_computer_betaboolEnable computer-use beta featuresTrue
validate_api_keyboolValidate key on initTrue
Features:
  • Native Claude tool calling
  • Automatic prompt caching
  • Computer-use beta support
  • Display metadata injection
Example:
agent = ClaudeAgent(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
)

result = await agent.run(
    Task(
        prompt="Navigate to example.com",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.so/v3/mcp",
                "headers": {
                    "Authorization": "Bearer ${HUD_API_KEY}",
                    "Mcp-Image": "hudpython/hud-remote-browser:latest"
                }
            }
        },
        evaluate_tool={"name": "evaluate", "arguments": {"name": "url_match", "arguments": {"pattern": "example.com"}}}
    )
)

OperatorAgent

from hud.agents import OperatorAgent
OpenAI Operator-style agent built on the Responses API with computer-use. Constructor Parameters:
ParameterTypeDescriptionDefault
model_clientAsyncOpenAIOpenAI clientAuto-created
modelstrResponses model to use"computer-use-preview"
environmentLiteral["windows","mac","linux","browser"]Computer environment"linux"
validate_api_keyboolValidate key on initTrue
Features:
  • OpenAI Responses computer-use
  • Operator-style system prompt guidance
  • Display metadata injection

GenericOpenAIChatAgent

from hud.agents import GenericOpenAIChatAgent
OpenAI-compatible chat.completions agent that works with any endpoint implementing the OpenAI schema (OpenAI, vLLM, Ollama, Together, custom, etc.). Constructor Parameters:
ParameterTypeDescriptionDefault
openai_clientAsyncOpenAIOpenAI-compatible client instanceRequired
model_namestrChat model name"gpt-4o-mini"
completion_kwargsdict[str, Any]Extra args forwarded to chat.completions.create{}
Example (local or custom endpoint):
from openai import AsyncOpenAI

openai_client = AsyncOpenAI(
    base_url="http://localhost:11434/v1",  # e.g., Ollama
    api_key="not-needed",
)

agent = GenericOpenAIChatAgent(
    openai_client=openai_client,
    model_name="llama3.1",
    completion_kwargs={"temperature": 0.2},
)

LangChainAgent

from hud.agents import LangChainAgent
LangChain integration for using any LangChain-compatible model. Constructor Parameters:
ParameterTypeDescriptionDefault
llmBaseLanguageModelLangChain-compatible modelRequired
Example:
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-opus-20240229")
agent = LangChainAgent(llm=llm)

GroundedOpenAIChatAgent

from hud.agents.grounded_openai import GroundedOpenAIChatAgent
OpenAI chat agent with a separate grounding model for element detection. The planning model issues natural-language “computer” tool calls that are grounded to coordinates by a dedicated vision model. Constructor Parameters:
ParameterTypeDescriptionDefault
grounder_configGrounderConfigConfiguration for the grounding modelRequired
model_namestrOpenAI chat model name"gpt-4o-mini"
allowed_toolslist[str]Exposed tool list (usually ["computer"])None
append_setup_outputboolAppend setup output to first turnFalse
system_promptstrSystem prompt to useOpinionated default
Note: The grounded agent is for advanced use-cases where you want to decouple planning from grounding. For most users, ClaudeAgent or OperatorAgent is sufficient.

Helper Classes

ResponseAgent

Base class for auto-response handlers that decide when to continue or stop.
from hud.agents.misc import ResponseAgent

class MyResponseAgent(ResponseAgent):
    async def determine_response(self, agent_output: str) -> str:
        if "task complete" in agent_output.lower():
            return "STOP"
        return "Continue with the next step"

Common Types

AgentResponse

Key fields:
  • tool_calls: list[MCPToolCall]
  • done: bool
  • content: str | None
  • reasoning: str | None
  • info: dict[str, Any]
  • isError: bool

MCPToolCall

Key fields:
  • id: str — unique identifier (auto-generated if not provided)
  • name: str
  • arguments: dict[str, Any]

MCPToolResult

Key fields:
  • content: list[ContentBlock]
  • structuredContent: dict[str, Any] | None
  • isError: bool

Trace

Key fields:
  • reward: float
  • done: bool
  • content: str | None
  • isError: bool
  • info: dict[str, Any]
  • task: Task | None
  • trace: list[TraceStep] — execution trace steps
  • messages: list[Any] — final conversation state

Usage Examples

Simple Prompt Execution

from hud.agents import ClaudeAgent
from hud.datasets import Task

agent = ClaudeAgent()
result = await agent.run(
    Task(
        prompt="Click the submit button",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.so/v3/mcp",
                "headers": {
                    "Authorization": "Bearer ${HUD_API_KEY}",
                    "Mcp-Image": "hudpython/hud-remote-browser:latest"
                }
            }
        }
    ),
    max_steps=5,
)
print("Done:", result.done, "Error:", result.isError)

Task Execution with Auto-Client

from hud.agents import OperatorAgent
from hud.datasets import Task

# No client needed - auto-created from task
agent = OperatorAgent()

task = Task(
    prompt="Find the price of the product",
    mcp_config={
        "hud": {
            "url": "https://mcp.hud.so/v3/mcp",
            "headers": {
                "Authorization": "Bearer ${HUD_API_KEY}",
                "Mcp-Image": "hudpython/hud-remote-browser:latest"
            }
        }
    },
    setup_tool={
        "name": "setup",
        "arguments": {"url": "https://example.com"}
    },
    evaluate_tool={
        "name": "evaluate", 
        "arguments": {"check": "price_found"}
    }
)

# Client created automatically
result = await agent.run(task, max_steps=20)
print(f"Reward: {result.reward}")

Custom Agent Implementation

from hud.agents import MCPAgent
from hud.types import AgentResponse
import hud

class MyCustomAgent(MCPAgent):
    metadata = {"custom": "metadata"}
    
    async def get_system_messages(self) -> list[dict]:
        return [{
            "role": "system",
            "content": self.system_prompt
        }]
    
    @hud.instrument(span_type="agent", record_args=False, record_result=True)
    async def get_response(self, messages: list[dict]) -> AgentResponse:
        # Your LLM call here
        response = await self.llm.chat(messages)
        
        return AgentResponse(
            content=response.content,
            tool_calls=[
                MCPToolCall(name=tc.name, arguments=tc.args)
                for tc in response.tool_calls
            ],
            done=response.stop_reason == "stop"
        )
    
    async def format_blocks(self, blocks: list[ContentBlock]) -> list[dict]:
        content = []
        for block in blocks:
            if block.type == "text":
                content.append({"type": "text", "text": block.text})
            elif block.type == "image":
                content.append({
                    "type": "image",
                    "image": {"data": block.data, "format": "png"}
                })
        
        return [{"role": "user", "content": content}]
    
    async def format_tool_results(self, tool_calls, tool_results) -> list[dict]:
        return [{
            "role": "tool",
            "content": result.content,
            "tool_call_id": call.name
        } for call, result in zip(tool_calls, tool_results)]

See Also

I