HUD Documentation — Evaluations and RL Environments.

HUD uses the Model Context Protocol (MCP) to provide a standard way for AI agents to interact with any software environment through tool calls.

Why MCP?

Traditional agent frameworks couple agents tightly to specific environments. MCP decouples them:

Without MCP

Agent code hardcoded for each environment
No standardization across tools
Difficult to swap agents or environments

With MCP

Any agent works with any environment
Standard protocol for all interactions
Easy to swap components

How It Works

MCP standardizes agent-environment communication through JSON-RPC messages. Agents call tools exposed by environments and receive structured responses.

Core Concepts

Tools

Tools are functions exposed by the environment:

{
  "name": "move",
  "description": "Move in a direction",
  "inputSchema": {
    "type": "object",
    "properties": {
      "direction": {
        "type": "string",
        "enum": ["up", "down", "left", "right"]
      }
    }
  }
}

Tool Calls & Results

Agents call tools and receive results:

# Agent makes a tool call
result = await client.call_tool("move", {"direction": "up"})

# Environment returns result
{
  "content": [{
    "type": "text",
    "text": "Moved up. New board state: ..."
  }]
}

Lifecycle Management

MCP defines a rigorous lifecycle for connections:

Initialization: Client and server negotiate capabilities and protocol version with client.initialize()
Operation: Normal tool calling and message exchange
Shutdown: Clean termination of the connection

The protocol ensures both sides understand each other’s capabilities before proceeding.

HUD’s MCP Extensions

HUD adds conventions on top of MCP:

Setup Tools: Initialize environment state (setup_board, navigate_to_url)
Evaluate Tools: Score agent performance (evaluate_max_tile, contains_text)
Lifecycle Management: Clean initialization and shutdown with client.initialize() and proper cleanup

See the Tools Reference for implementation details.

Transport Options

HUD environments are designed for 100% reproducibility through Docker:

Local Docker Build
Remote Docker Launch

Run environments locally for development and debugging using stdio transport:

mcp_config = {
  "hud-text-2048": {
    "command": "docker",
    "args": ["run", "--rm", "-i", "hudpython/hud-text-2048:v1.2"]
  }
}

Transport: stdio - JSON-RPC over stdin/stdout
Pros: Full control, easy debugging, no network latency
Use case: Development, testing, single-agent runs

The -i flag enables Docker’s interactive mode, allowing stdio communication between the client and server process.

Both approaches use the exact same Docker image, ensuring identical behavior whether running locally or remotely

Next Steps

Architecture

See how HUD builds on MCP for agent evaluation

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

MCP Protocol

Why MCP?

Without MCP

With MCP

How It Works

Core Concepts

Tools

Tool Calls & Results

Lifecycle Management

HUD’s MCP Extensions

Transport Options

Next Steps

Architecture

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

​Why MCP?

Without MCP

With MCP

​How It Works

​Core Concepts

​Tools

​Tool Calls & Results

​Lifecycle Management

​HUD’s MCP Extensions

​Transport Options

​Next Steps

Architecture

Why MCP?

How It Works

Core Concepts

Tools

Tool Calls & Results

Lifecycle Management

HUD’s MCP Extensions

Transport Options

Next Steps