Uploading TaskSets
Learn how to upload TaskSets and view them on the HUD platform
Uploading TaskSets
TaskSets are collections of tasks that can be uploaded to the HUD platform for evaluation and sharing. This guide explains how to upload TaskSets and access them through the platform.
Creating and Uploading a TaskSet
You can create a TaskSet from a list of tasks and upload it to the platform:
TaskSet Parameters
When creating a TaskSet, you can specify:
name
: A descriptive name for your TaskSetdescription
: Detailed description of what the TaskSet evaluatestasks
: List of Task objectsmetadata
: Optional dictionary of metadata about the TaskSet
Task Configuration
Each Task in a TaskSet can include:
prompt
: The instruction or question for the agentgym
: The environment type (e.g., “hud-browser”, “hud-ubuntu”)setup
: Optional list of setup actions to run before the agent startsevaluate
: Function configuration to determine task successid
: Optional unique identifier for the task
Common evaluation functions include:
page_contains
: Checks if specific text exists on the pageresponse_includes
: Verifies if the agent’s final response contains expected textcookies_exist
: Checks if a set of cookies are present
Viewing TaskSets on the Platform
After uploading, you can view and manage your TaskSets at app.hud.so/evalsets. The platform provides:
- List of all your uploaded TaskSets
- Detailed view of individual tasks within each TaskSet
- Task prompts and evaluation criteria
- Evaluation results when agents are run against the TaskSet
Loading an Existing TaskSet
You can load a previously uploaded TaskSet using its name:
Best Practices
-
Task Organization
- Give tasks clear, unique IDs
- Use descriptive prompts
- Group related tasks into themed TaskSets
-
Evaluation Design
- Choose appropriate evaluation functions
- Provide clear success criteria
- Test evaluation logic before uploading
-
Documentation
- Write clear task prompts
- Document expected agent behavior
- Include example solutions where appropriate
Running Evaluations
Once uploaded, you can run agents against your TaskSet:
Platform Features
The HUD platform (app.hud.so) provides several features for working with TaskSets:
-
TaskSet Management
- Browse all uploaded TaskSets
- View individual task details
- Filter and search tasks
- Track evaluation history
-
Analysis Tools
- Compare agent performance
- View detailed task results
- Export evaluation data
- Share results with team members
-
Collaboration
- Share TaskSets with team members
- Collaborate on task creation
- Track changes and versions