Skip to main content
Get up and running with HUD in minutes. This guide walks you through installation, basic setup, and running your first agent evaluation.

Quick Clone

The fastest way to get started:
# Clone a complete example project with uv
uvx hud-python quickstart
This sets you up with a working agent evaluation example you can run immediately.

Installation

  • Basic Install
  • Agent Install
  • CLI Tools
pip install hud-python

API Keys

Set your API keys as environment variables:
export HUD_API_KEY="sk-hud-..."      # Get from hud.so
export ANTHROPIC_API_KEY="sk-ant-..." # For Claude agents
export OPENAI_API_KEY="sk-..."        # For OpenAI agents
Create a .env file in your project root to manage API keys locally

Your First Agent

Run an agent on the 2048 game environment:
import asyncio, os
import hud
from hud.datasets import Task
from hud.agents import ClaudeAgent

async def main():
    # The trace context captures ALL agent interactions within a "task run"
    # Everything inside this context shows up as one trace on hud.so
    with hud.trace("quickstart-2048"):
        # Define task with remote MCP environment
        task = Task(
            prompt="Win a game of 2048 by reaching the 128 tile",
            mcp_config={
                "hud": {
                    "url": "https://mcp.hud.so/v3/mcp",
                    "headers": {
                        "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                        "Mcp-Image": "hudevals/hud-text-2048:0.1.3"
                    }
                }
            },
        setup_tool={"name": "setup", "arguments": {"name": "board", "arguments": { "board_size": 4}}},
        evaluate_tool={"name": "evaluate", "arguments": {"name": "max_number", "arguments": {"target": 64}}}
        )
        
        # Run agent (auto-creates MCP client from task.mcp_config)
        agent = ClaudeAgent()
        result = await agent.run(task)
        print(f"Max tile reached: {result.reward}")

asyncio.run(main())
The trace context ensures that the entire task run - from setup through evaluation - appears as one coherent trace on the platform

What just happened?

  1. Task Definition: We created a Task with:
    • A prompt telling the agent what to do
    • An mcp_config pointing to a remote MCP environment
    • Setup and evaluation tools to initialize and score the game
  2. Auto Client: The agent automatically created an MCP client from task.mcp_config
  3. Telemetry: The trace context captured all interactions for debugging
  4. Evaluation: The evaluate_max_tile tool returned the highest tile as reward

Next Steps

CLI Quick Reference

# Create sample environment
hud init

# Start hot-reload development server for an environment
hud dev . --build --interactive

# Train a model on your environment
hud rl
See the CLI Reference for detailed command documentation
I