Get up and running with HUD in minutes. This guide walks you through installation, basic setup, and running your first agent evaluation.
Quick Clone
The fastest way to get started:
# Clone a complete example project with uv
uvx hud-python quickstart
This sets you up with a working agent evaluation example you can run immediately.
Installation
Basic Install
Agent Install
CLI Tools
API Keys
Set your API keys as environment variables:
export HUD_API_KEY="sk-hud-..." # Get from hud.so
export ANTHROPIC_API_KEY="sk-ant-..." # For Claude agents
export OPENAI_API_KEY="sk-..." # For OpenAI agents
Create a .env
file in your project root to manage API keys locally
Your First Agent
Run an agent on the 2048 game environment:
import asyncio, os
import hud
from hud.datasets import Task
from hud.agents import ClaudeAgent
async def main():
# The trace context captures ALL agent interactions within a "task run"
# Everything inside this context shows up as one trace on hud.so
with hud.trace("quickstart-2048"):
# Define task with remote MCP environment
task = Task(
prompt="Win a game of 2048 by reaching the 128 tile",
mcp_config={
"hud": {
"url": "https://mcp.hud.so/v3/mcp",
"headers": {
"Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
"Mcp-Image": "hudevals/hud-text-2048:0.1.3"
}
}
},
setup_tool={"name": "setup", "arguments": {"name": "board", "arguments": { "board_size": 4}}},
evaluate_tool={"name": "evaluate", "arguments": {"name": "max_number", "arguments": {"target": 64}}}
)
# Run agent (auto-creates MCP client from task.mcp_config)
agent = ClaudeAgent()
result = await agent.run(task)
print(f"Max tile reached: {result.reward}")
asyncio.run(main())
The trace context ensures that the entire task run - from setup through evaluation - appears as one coherent trace on the platform
What just happened?
-
Task Definition: We created a
Task
with:
- A prompt telling the agent what to do
- An
mcp_config
pointing to a remote MCP environment
- Setup and evaluation tools to initialize and score the game
-
Auto Client: The agent automatically created an MCP client from
task.mcp_config
-
Telemetry: The
trace
context captured all interactions for debugging
-
Evaluation: The
evaluate_max_tile
tool returned the highest tile as reward
Next Steps
CLI Quick Reference
# Create sample environment
hud init
# Start hot-reload development server for an environment
hud dev . --build --interactive
# Train a model on your environment
hud rl