Uploading TaskSets

TaskSets are collections of tasks that can be uploaded to the HUD platform for evaluation and sharing. This guide explains how to upload TaskSets and access them through the platform.

Creating and Uploading a TaskSet

You can create a TaskSet from a list of tasks and upload it to the platform:

from hud.task import Task
from hud.taskset import TaskSet

# Create tasks
tasks = [
    Task(
        prompt="Navigate to example.com and verify the login page is displayed",
        gym="hud-browser",
        setup=[
            ("goto", "https://example.com/login")
        ],
        evaluate={
            "function": "page_contains",
            "args": "Login"
        }
    ),
    Task(
        prompt="What is the capital of France?",
        gym="hud-browser",
        evaluate={
            "function": "response_includes",
            "args": "Paris"
        }
    )
]

# Create a TaskSet
taskset = TaskSet(tasks=tasks)

# Upload the TaskSet
taskset_id = await taskset.upload("my-taskset")
print(f"TaskSet uploaded with ID: {taskset_id}")

TaskSet Parameters

When creating a TaskSet, you can specify:

name: A descriptive name for your TaskSet
description: Detailed description of what the TaskSet evaluates
tasks: List of Task objects
metadata: Optional dictionary of metadata about the TaskSet

Task Configuration

Each Task in a TaskSet can include:

prompt: The instruction or question for the agent
gym: The environment type (e.g., “hud-browser”, “hud-ubuntu”)
setup: Optional list of setup actions to run before the agent starts
evaluate: Function configuration to determine task success
id: Optional unique identifier for the task

Common evaluation functions include:

page_contains: Checks if specific text exists on the page
response_includes: Verifies if the agent’s final response contains expected text
cookies_exist: Checks if a set of cookies are present

Viewing TaskSets on the Platform

After uploading, you can view and manage your TaskSets at app.hud.so/evalsets. The platform provides:

List of all your uploaded TaskSets
Detailed view of individual tasks within each TaskSet
Task prompts and evaluation criteria
Evaluation results when agents are run against the TaskSet

Loading an Existing TaskSet

You can load a previously uploaded TaskSet using its name:

from hud.taskset import TaskSet

# Load an existing TaskSet
taskset = await TaskSet.load("taskset-name")

# Access tasks
for task in taskset.tasks:
    print(f"Task ID: {task.id}")
    print(f"Prompt: {task.prompt}")
    print(f"Evaluation: {task.evaluate}")

Best Practices

Task Organization
- Give tasks clear, unique IDs
- Use descriptive prompts
- Group related tasks into themed TaskSets
Evaluation Design
- Choose appropriate evaluation functions
- Provide clear success criteria
- Test evaluation logic before uploading
Documentation
- Write clear task prompts
- Document expected agent behavior
- Include example solutions where appropriate

Running Evaluations

Once uploaded, you can run agents against your TaskSet:

from hud.job import run_job
from my_agent import MyAgent

# Run evaluation using the TaskSet
job = await run_job(
    agent_cls=MyAgent,
    task_or_taskset=taskset,
    job_name="Evaluation Run"
)

# View results on app.hud.so/jobs/{job.id}
print(f"View results at: https://app.hud.so/jobs/{job.id}")

Platform Features

The HUD platform (app.hud.so) provides several features for working with TaskSets:

TaskSet Management
- Browse all uploaded TaskSets
- View individual task details
- Filter and search tasks
- Track evaluation history
Analysis Tools
- Compare agent performance
- View detailed task results
- Export evaluation data
- Share results with team members
Collaboration
- Share TaskSets with team members
- Collaborate on task creation
- Track changes and versions

Getting Started

Examples

Features

Concepts

Environments

Uploading TaskSets

Uploading TaskSets

Creating and Uploading a TaskSet

TaskSet Parameters

Task Configuration

Viewing TaskSets on the Platform

Loading an Existing TaskSet

Best Practices

Running Evaluations

Platform Features

Getting Started

Examples

Features

Concepts

Environments

​Uploading TaskSets

​Creating and Uploading a TaskSet

​TaskSet Parameters

​Task Configuration

​Viewing TaskSets on the Platform

​Loading an Existing TaskSet

​Best Practices

​Running Evaluations

​Platform Features

​Related Topics

Uploading TaskSets

Creating and Uploading a TaskSet

TaskSet Parameters

Task Configuration

Viewing TaskSets on the Platform

Loading an Existing TaskSet

Best Practices

Running Evaluations

Platform Features

Related Topics