Version 0.4.53 - Latest stable release
I want to evaluate agents
Test Claude, Operator, or custom agents on benchmarks like SheetBench and OSWorld
I want to build environments
Wrap any software in dockerized MCP for scalable and generalizable agent evaluation
I want to train agents
Use reinforcement learning and GRPO on evaluations to improve agent performance
What is HUD?
HUD connects AI agents to software environments using the Model Context Protocol (MCP). Whether you’re evaluating existing agents, building new environments, or training models with RL, HUD provides the infrastructure.Why HUD?
- 🔌 MCP-native: Any agent can connect to any environment
- 📡 Live telemetry: Debug every tool call at hud.so
- 🚀 Production-ready: From local Docker to cloud scale
- 🎯 Built-in benchmarks: OSWorld-Verified, SheetBench-50, and more
- 🔧 CLI tools: Create, develop, run, and train with
hud init
,hud dev
,hud run
,hud eval
,hud rl
3-minute quickstart
Run your first agent evaluation with zero setup