Get up and running with HUD in minutes. This guide walks you through installation, basic setup, and running your first agent evaluation.

Quick Clone

The fastest way to get started:
# Clone a complete example project with uv
uvx hud-python quickstart
This sets you up with a working agent evaluation example you can run immediately.

Installation

pip install hud-python

API Keys

Set your API keys as environment variables:
export HUD_API_KEY="sk-hud-..."      # Get from app.hud.so
export ANTHROPIC_API_KEY="sk-ant-..." # For Claude agents
export OPENAI_API_KEY="sk-..."        # For OpenAI agents
Create a .env file in your project root to manage API keys locally

Your First Agent

Run an agent on the 2048 game environment:
import asyncio, os
import hud
from hud.datasets import Task
from hud.agents import ClaudeAgent

async def main():
    # The trace context captures ALL agent interactions within a "task run"
    # Everything inside this context shows up as one trace on app.hud.so
    with hud.trace("quickstart-2048"):
        # Define task with remote MCP environment
        task = Task(
            prompt="Win a game of 2048 by reaching the 128 tile",
            mcp_config={
                "hud": {
                    "url": "https://mcp.hud.so/v3/mcp",
                    "headers": {
                        "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                        "Mcp-Image": "hudpython/hud-text-2048:v1.2"
                    }
                }
            },
        setup_tool={"name": "setup", "arguments": {"name": "board", "arguments": { "board_size": 4}}},
        evaluate_tool={"name": "evaluate", "arguments": {"name": "max_number", "arguments": {"target": 64}}}
        )
        
        # Run agent (auto-creates MCP client from task.mcp_config)
        agent = ClaudeAgent()
        result = await agent.run(task)
        print(f"Max tile reached: {result.reward}")

asyncio.run(main())
The trace context ensures that the entire task run - from setup through evaluation - appears as one coherent trace on the platform

What just happened?

  1. Task Definition: We created a Task with:
    • A prompt telling the agent what to do
    • An mcp_config pointing to a remote MCP environment
    • Setup and evaluation tools to initialize and score the game
  2. Auto Client: The agent automatically created an MCP client from task.mcp_config
  3. Telemetry: The trace context captured all interactions for debugging
  4. Evaluation: The evaluate_max_tile tool returned the highest tile as reward

Next Steps

CLI Quick Reference

# Analyze any MCP environment
hud analyze hudpython/hud-text-2048:v1.2

# Debug environment initialization
hud debug hudpython/hud-text-2048:v1.2

# Start hot-reload development server
hud dev . --build

# Run HUD tools as MCP server (for Cursor/Claude)
hud mcp
See the CLI Reference for detailed command documentation