hud eval

The hud eval command runs an agent on a tasks file or a HuggingFace dataset.

Usage

hud eval [SOURCE] [AGENT] [OPTIONS]

Arguments

source

string

HuggingFace dataset (e.g., hud-evals/SheetBench-50) or task JSON/JSONL file. If omitted, looks for a tasks file in the current directory.

agent

string

Agent backend to use: claude, openai, or vllm. If omitted, an interactive selector appears (including HUD hosted models).

Options

--full

boolean

default:"false"

Run the entire dataset (omit for single-task debug mode)

--model

string

Model name for the chosen agent (required for some agents)

--allowed-tools

string

Comma-separated list of allowed tools

--max-concurrent

integer

default:"50"

Max concurrent tasks

--max-steps

integer

default:"30"

Maximum steps per task (default varies by mode)

--parallel

boolean

default:"false"

Use process-based parallel execution for large datasets (100+ tasks)

--max-workers

integer

Number of worker processes for parallel mode (auto-optimized if not set)

--max-concurrent-per-worker

integer

default:"20"

Maximum concurrent tasks per worker in parallel mode

--verbose

boolean

default:"false"

Enable verbose agent output

--vllm-base-url

string

Base URL for vLLM server (when using --agent vllm or HUD hosted models)

--group-size

integer

default:"1"

Number of times to run each task (mini-batch style)

Examples

# Minimal (interactive agent selection)
hud eval tasks.json

# Entire dataset with Claude
hud eval hud-evals/SheetBench-50 claude --full

# vLLM with explicit base URL
hud eval tasks.json vllm --model llama3.1 --vllm-base-url http://localhost:8000

# Limit tools and concurrency
hud eval tasks.json claude --allowed-tools click,type --max-concurrent 10

Notes

If you select a HUD hosted model, hud eval will route through vLLM with the appropriate base model.
When SOURCE is omitted, an interactive file picker helps locate a tasks file.

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

Usage

Arguments

Options

Examples

Notes

See Also

Get Started

Ideas

Environments

RL

Agents

CLI Reference

SDK Reference

​Usage

​Arguments

​Options

​Examples

​Notes

​See Also

Usage

Arguments

Options

Examples

Notes

See Also