What is HUD?

HUD connects AI agents to software environments using the Model Context Protocol (MCP). Whether you’re evaluating existing agents, building new environments, or training models with RL, HUD provides the infrastructure.

Why HUD?

  • πŸ”Œ MCP-native: Any agent can connect to any environment
  • πŸ“‘ Live telemetry: Debug every tool call at app.hud.so
  • πŸš€ Production-ready: From local Docker to cloud scale
  • 🎯 Built-in benchmarks: OSWorld-Verified, SheetBench-50, and more
  • πŸ”§ CLI tools: Debug, analyze, create environments with hud debug and hud analyze

3-minute quickstart

Run your first agent evaluation with zero setup

Clone starter project

uvx hud-python quickstart

Using an AI assistant?

Add HUD docs as an MCP server for better understanding:
claude mcp add docs-hud https://docs.hud.so/mcp

Quick Example

import asyncio, os, hud
from hud.datasets import Task
from hud.agents import ClaudeAgent

async def main():
    # Define evaluation task with remote MCP
    task = Task(
        prompt="Win a game of 2048 by reaching the 128 tile",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.so/v3/mcp",
                "headers": {
                    "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                    "Mcp-Image": "hudpython/hud-text-2048:v1.2"
                }
            }
        },
        setup_tool={"name": "setup", "arguments": {"name": "board", "arguments": { "board_size": 4}}},
        evaluate_tool={"name": "evaluate", "arguments": {"name": "max_number", "arguments": {"target": 64}}}
    )
    
    # Run agent (auto-creates MCP client)
    agent = ClaudeAgent()
    result = await agent.run(task)
    print(f"Score: {result.reward}")

asyncio.run(main())

Community

Are you a startup building agents?

πŸ“… Hop on a call or πŸ“§ founders@hud.so