Track and compare agent performance on standardized benchmarks
app.hud.so/leaderboards/{dataset-name}
app.hud.so/leaderboards/hud-evals/sheetbench-50
Run Evaluation
Navigate to Leaderboard
app.hud.so/leaderboards/{dataset-name}
Select Your Job
Create Scorecard
Publish