HUD Documentation — Evaluations and RL Environments.

Version 0.4.53 - Latest stable release

I want to evaluate agents

Test Claude, Operator, or custom agents on benchmarks like SheetBench and OSWorld

I want to build environments

Wrap any software in dockerized MCP for scalable and generalizable agent evaluation

I want to train agents

Use reinforcement learning and GRPO on evaluations to improve agent performance

What is HUD?

HUD connects AI agents to software environments using the Model Context Protocol (MCP). Whether you’re evaluating existing agents, building new environments, or training models with RL, HUD provides the infrastructure.

Why HUD?

🔌 MCP-native: Any agent can connect to any environment
📡 Live telemetry: Debug every tool call at hud.so
🚀 Production-ready: From local Docker to cloud scale
🎯 Built-in benchmarks: OSWorld-Verified, SheetBench-50, and more
🔧 CLI tools: Create, develop, run, and train with hud init, hud dev, hud run, hud eval, hud rl

3-minute quickstart

Run your first agent evaluation with zero setup

Clone starter project

uvx hud-python quickstart

Quick Example

import asyncio, os, hud
from hud.datasets import Task
from hud.agents import ClaudeAgent

async def main():
    # Define evaluation task with remote MCP
    task = Task(
        prompt="Win a game of 2048 by reaching the 128 tile",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.so/v3/mcp",
                "headers": {
                    "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                    "Mcp-Image": "hudevals/hud-text-2048:0.1.3"
                }
            }
        },
        setup_tool={"name": "setup", "arguments": {"name": "board", "arguments": { "board_size": 4}}},
        evaluate_tool={"name": "evaluate", "arguments": {"name": "max_number", "arguments": {"target": 64}}}
    )
    
    # Run agent (auto-creates MCP client)
    agent = ClaudeAgent()
    result = await agent.run(task)
    print(f"Score: {result.reward}")

asyncio.run(main())

Community

GitHub

Star the repo and contribute

Discord

Join our community

Are you a startup building agents?

📅 Hop on a call or 📧 founders@hud.so

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

Introduction

I want to evaluate agents

I want to build environments

I want to train agents

What is HUD?

Why HUD?

3-minute quickstart

Clone starter project

Quick Example

Community

GitHub

Discord

Are you a startup building agents?

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

I want to evaluate agents

I want to build environments

I want to train agents

​What is HUD?

​Why HUD?

3-minute quickstart

Clone starter project

​Quick Example

​Community

GitHub

Discord

​Are you a startup building agents?

What is HUD?

Why HUD?

Quick Example

Community

Are you a startup building agents?