HUD is built as a thin orchestration layer on top of MCP, providing structure for agent evaluation, environment management, and telemetry.

System Overview

Core Components

1. Agents (hud.agents)

Agents make decisions and call tools:
from hud.agents import ClaudeAgent, OperatorAgent

# Built-in agents
agent = ClaudeAgent()      # Uses Anthropic's Claude
agent = OperatorAgent()    # Uses OpenAI Operator

# All agents inherit from MCPAgent
class MCPAgent:
    async def initialize(task: str | Task | None)
    async def run(prompt_or_task: str | Task, max_steps: int = 10) -> Trace
    async def call_tools(tool_calls: list[MCPToolCall]) -> list[MCPToolResult]
Agents can auto-create MCP clients from task.mcp_config - no manual client setup needed

2. Tasks (hud.Task)

Tasks define what agents should accomplish:
from hud.datasets import Task

task = Task(
    prompt="Complete 3 items in the TODO list",
    mcp_config={
        "hud": {
            "url": "https://mcp.hud.so/v3/mcp",
            "headers": {
                "Authorization": f"Bearer {api_key}",
                "Mcp-Image": "hudpython/hud-browser:latest"
            }
        }
    },
    # Tool names and arguments map directly to MCP server tools
    setup_tool={
        "name": "setup",  # Calls def setup(name, num_items) on the environment
        "arguments": {
            "name": "todo_seed",
            "num_items": 5
        }
    },
    evaluate_tool={
        "name": "evaluate",  # Calls def evaluate(name, expected_count) on the environment
        "arguments": {
            "name": "todo_completed",
            "expected_count": 3
        }
    }
)
The name and arguments in setup/evaluate tools correspond exactly to the tool names and parameters exposed by the MCP server

3. MCP Clients (hud.clients)

Clients handle the MCP protocol:
from hud.clients import MCPClient

# Auto-created by agents, or manual:
client = MCPClient(mcp_config)
await client.initialize()
tools = await client.list_tools()

4. Environments

Environments are MCP servers exposing tools:
from hud.server import MCPServer

server = MCPServer("my-env")

@server.tool()
def move(direction: str):
    # Environment logic
    return result

5. Telemetry (hud.trace)

Real-time observability:
import hud

with hud.trace("my-evaluation"):
    result = await agent.run(task)
    # View at app.hud.so/traces/{trace_id}

Execution Flow

1

Task Definition

Create a Task with prompt and MCP configuration
2

Agent Initialization

Agent creates MCP client (if needed) and connects to environment
3

Setup Phase

Execute setup_tool to initialize environment state
4

Execution Loop

Agent receives observations, makes decisions, calls tools
5

Evaluation

Execute evaluate_tool to score performance
6

Telemetry

All interactions streamed to HUD backend for analysis

Key Design Principles

  1. Protocol-First: Everything speaks MCP
  2. Composable: Mix and match agents, environments, evaluations
  3. Observable: Built-in telemetry for every interaction
  4. Testable: Reproducible evaluations with Docker
  5. Extensible: Easy to add new agents or environments
The MCPServer class wraps FastMCP with lifecycle management, making it easy to build Docker-based environments

Next Steps