HUD Documentation — Evaluations and RL Environments.

The HUD SDK provides a base MCPAgent class and several pre-built agent implementations for interacting with MCP environments.

Base Class

MCPAgent

from hud.agents import MCPAgent

Abstract base class for all MCP-enabled agents. Handles the agent loop, MCP client lifecycle, tool discovery/filtering (including lifecycle tools like setup, evaluate, response), and standardized telemetry/logging. Constructor Parameters:

Parameter	Type	Description	Default
`mcp_client`	`AgentMCPClient`	MCP client for server connections	`None`
`allowed_tools`	`list[str]`	List of tool names to allow	`None` (all)
`disallowed_tools`	`list[str]`	List of tool names to disallow	`[]`
`system_prompt`	`str`	System prompt to seed the conversation	Default prompt
`append_setup_output`	`bool`	Append setup tool output to initial context	`True`
`initial_screenshot`	`bool`	Include an initial screenshot when supported	`True`
`model_name`	`str`	Model label for telemetry/logging	`"mcp-agent"`
`response_agent`	`ResponseAgent`	Optional auto-continue/stop helper	`None`
`auto_trace`	`bool`	Enable automatic tracing spans	`True`
`verbose`	`bool`	Verbose console logs for development	`False`

Key Methods:

async def initialize(task: str | Task | None = None) -> None
    """Initialize agent with task-specific configuration."""

async def run(prompt_or_task: str | Task | dict[str, Any], max_steps: int = 10) -> Trace
    """Run agent with prompt or task. Returns Trace with results."""

async def call_tools(tool_call: MCPToolCall | list[MCPToolCall]) -> list[MCPToolResult]
    """Execute tool calls through MCP client."""

def get_available_tools() -> list[types.Tool]
    """Get filtered list of available tools (excludes lifecycle)."""

def get_tool_schemas() -> list[dict]
    """Get tool schemas formatted for the model."""

Abstract Methods (must implement):

async def get_system_messages() -> list[Any]
    """Get system prompt formatted for the model."""

async def get_response(messages: list[Any]) -> AgentResponse
    """Get model response including tool calls."""

async def format_blocks(blocks: list[ContentBlock]) -> list[Any]
    """Format content blocks into model messages."""

async def format_tool_results(tool_calls: list[MCPToolCall], 
                            tool_results: list[MCPToolResult]) -> list[Any]
    """Format tool results for the model."""

Class Variables:

metadata: dict[str, Any] | None - Injected into MCP initialize requests
required_tools: list[str] - Tools that must be present or initialization fails

Auto-Client Creation: If no mcp_client is provided but a Task with mcp_config is passed to run(), an MCPClient is automatically created and cleaned up. Lifecycle & Filtering:

Lifecycle tools (e.g., setup, evaluate, and auto-detected response) are hidden from the model but available to the framework
Use allowed_tools / disallowed_tools to control the tool surface visible to your model

Pre-built Agents

ClaudeAgent

from hud.agents import ClaudeAgent

Claude-specific implementation using Anthropic’s API. Constructor Parameters:

Parameter	Type	Description	Default
`model_client`	`AsyncAnthropic`	Anthropic client	Auto-created
`model`	`str`	Claude model to use	`"claude-sonnet-4-20250514"`
`max_tokens`	`int`	Maximum response tokens	`4096`
`use_computer_beta`	`bool`	Enable computer-use beta features	`True`
`validate_api_key`	`bool`	Validate key on init	`True`

Features:

Native Claude tool calling
Automatic prompt caching
Computer-use beta support
Display metadata injection

Example:

agent = ClaudeAgent(
    model="claude-sonnet-4-20250514",
    max_tokens=8192,
)

result = await agent.run(
    Task(
        prompt="Navigate to example.com",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.so/v3/mcp",
                "headers": {
                    "Authorization": "Bearer ${HUD_API_KEY}",
                    "Mcp-Image": "hudpython/hud-remote-browser:latest"
                }
            }
        },
        evaluate_tool={"name": "evaluate", "arguments": {"name": "url_match", "arguments": {"pattern": "example.com"}}}
    )
)

OperatorAgent

from hud.agents import OperatorAgent

OpenAI Operator-style agent built on the Responses API with computer-use. Constructor Parameters:

Parameter	Type	Description	Default
`model_client`	`AsyncOpenAI`	OpenAI client	Auto-created
`model`	`str`	Responses model to use	`"computer-use-preview"`
`environment`	`Literal["windows","mac","linux","browser"]`	Computer environment	`"linux"`
`validate_api_key`	`bool`	Validate key on init	`True`

Features:

OpenAI Responses computer-use
Operator-style system prompt guidance
Display metadata injection

GenericOpenAIChatAgent

from hud.agents import GenericOpenAIChatAgent

OpenAI-compatible chat.completions agent that works with any endpoint implementing the OpenAI schema (OpenAI, vLLM, Ollama, Together, custom, etc.). Constructor Parameters:

Parameter	Type	Description	Default
`openai_client`	`AsyncOpenAI`	OpenAI-compatible client instance	Required
`model_name`	`str`	Chat model name	`"gpt-4o-mini"`
`completion_kwargs`	`dict[str, Any]`	Extra args forwarded to `chat.completions.create`	`{}`

Example (local or custom endpoint):

from openai import AsyncOpenAI

openai_client = AsyncOpenAI(
    base_url="http://localhost:11434/v1",  # e.g., Ollama
    api_key="not-needed",
)

agent = GenericOpenAIChatAgent(
    openai_client=openai_client,
    model_name="llama3.1",
    completion_kwargs={"temperature": 0.2},
)

LangChainAgent

from hud.agents import LangChainAgent

LangChain integration for using any LangChain-compatible model. Constructor Parameters:

Parameter	Type	Description	Default
`llm`	`BaseLanguageModel`	LangChain-compatible model	Required

Example:

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-opus-20240229")
agent = LangChainAgent(llm=llm)

GroundedOpenAIChatAgent

from hud.agents.grounded_openai import GroundedOpenAIChatAgent

OpenAI chat agent with a separate grounding model for element detection. The planning model issues natural-language “computer” tool calls that are grounded to coordinates by a dedicated vision model. Constructor Parameters:

Parameter	Type	Description	Default
`grounder_config`	`GrounderConfig`	Configuration for the grounding model	Required
`model_name`	`str`	OpenAI chat model name	`"gpt-4o-mini"`
`allowed_tools`	`list[str]`	Exposed tool list (usually `["computer"]`)	`None`
`append_setup_output`	`bool`	Append setup output to first turn	`False`
`system_prompt`	`str`	System prompt to use	Opinionated default

Note: The grounded agent is for advanced use-cases where you want to decouple planning from grounding. For most users, ClaudeAgent or OperatorAgent is sufficient.

Helper Classes

ResponseAgent

Base class for auto-response handlers that decide when to continue or stop.

from hud.agents.misc import ResponseAgent

class MyResponseAgent(ResponseAgent):
    async def determine_response(self, agent_output: str) -> str:
        if "task complete" in agent_output.lower():
            return "STOP"
        return "Continue with the next step"

Common Types

AgentResponse

Key fields:

tool_calls: list[MCPToolCall]
done: bool
content: str | None
reasoning: str | None
info: dict[str, Any]
isError: bool

MCPToolCall

Key fields:

id: str — unique identifier (auto-generated if not provided)
name: str
arguments: dict[str, Any]

MCPToolResult

Key fields:

content: list[ContentBlock]
structuredContent: dict[str, Any] | None
isError: bool

Trace

Key fields:

reward: float
done: bool
content: str | None
isError: bool
info: dict[str, Any]
task: Task | None
trace: list[TraceStep] — execution trace steps
messages: list[Any] — final conversation state

Usage Examples

Simple Prompt Execution

from hud.agents import ClaudeAgent
from hud.datasets import Task

agent = ClaudeAgent()
result = await agent.run(
    Task(
        prompt="Click the submit button",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.so/v3/mcp",
                "headers": {
                    "Authorization": "Bearer ${HUD_API_KEY}",
                    "Mcp-Image": "hudpython/hud-remote-browser:latest"
                }
            }
        }
    ),
    max_steps=5,
)
print("Done:", result.done, "Error:", result.isError)

Task Execution with Auto-Client

from hud.agents import OperatorAgent
from hud.datasets import Task

# No client needed - auto-created from task
agent = OperatorAgent()

task = Task(
    prompt="Find the price of the product",
    mcp_config={
        "hud": {
            "url": "https://mcp.hud.so/v3/mcp",
            "headers": {
                "Authorization": "Bearer ${HUD_API_KEY}",
                "Mcp-Image": "hudpython/hud-remote-browser:latest"
            }
        }
    },
    setup_tool={
        "name": "setup",
        "arguments": {"url": "https://example.com"}
    },
    evaluate_tool={
        "name": "evaluate", 
        "arguments": {"check": "price_found"}
    }
)

# Client created automatically
result = await agent.run(task, max_steps=20)
print(f"Reward: {result.reward}")

Custom Agent Implementation

from hud.agents import MCPAgent
from hud.types import AgentResponse
import hud

class MyCustomAgent(MCPAgent):
    metadata = {"custom": "metadata"}
    
    async def get_system_messages(self) -> list[dict]:
        return [{
            "role": "system",
            "content": self.system_prompt
        }]
    
    @hud.instrument(span_type="agent", record_args=False, record_result=True)
    async def get_response(self, messages: list[dict]) -> AgentResponse:
        # Your LLM call here
        response = await self.llm.chat(messages)
        
        return AgentResponse(
            content=response.content,
            tool_calls=[
                MCPToolCall(name=tc.name, arguments=tc.args)
                for tc in response.tool_calls
            ],
            done=response.stop_reason == "stop"
        )
    
    async def format_blocks(self, blocks: list[ContentBlock]) -> list[dict]:
        content = []
        for block in blocks:
            if block.type == "text":
                content.append({"type": "text", "text": block.text})
            elif block.type == "image":
                content.append({
                    "type": "image",
                    "image": {"data": block.data, "format": "png"}
                })
        
        return [{"role": "user", "content": content}]
    
    async def format_tool_results(self, tool_calls, tool_results) -> list[dict]:
        return [{
            "role": "tool",
            "content": result.content,
            "tool_call_id": call.name
        } for call, result in zip(tool_calls, tool_results)]

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

Agents

Base Class

MCPAgent

Pre-built Agents

ClaudeAgent

OperatorAgent

GenericOpenAIChatAgent

LangChainAgent

GroundedOpenAIChatAgent

Helper Classes

ResponseAgent

Common Types

AgentResponse

MCPToolCall

MCPToolResult

Trace

Usage Examples

Simple Prompt Execution

Task Execution with Auto-Client

Custom Agent Implementation

See Also

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

​Base Class

​MCPAgent

​Pre-built Agents

​ClaudeAgent

​OperatorAgent

​GenericOpenAIChatAgent

​LangChainAgent

​GroundedOpenAIChatAgent

​Helper Classes

​ResponseAgent

​Common Types

​AgentResponse

​MCPToolCall

​MCPToolResult

​Trace

​Usage Examples

​Simple Prompt Execution

​Task Execution with Auto-Client

​Custom Agent Implementation

​See Also

Base Class

MCPAgent

Pre-built Agents

ClaudeAgent

OperatorAgent

GenericOpenAIChatAgent

LangChainAgent

GroundedOpenAIChatAgent

Helper Classes

ResponseAgent

Common Types

AgentResponse

MCPToolCall

MCPToolResult

Trace

Usage Examples

Simple Prompt Execution

Task Execution with Auto-Client

Custom Agent Implementation

See Also