Training datasets are collections of HUD tasks stored on HuggingFace. Each task defines a prompt, MCP configuration, and evaluation criteria.

Creating Datasets

from hud.datasets import Task, save_tasks
from hud.types import MCPToolCall

# Create task variations
tasks = []
for i in range(100):
    tasks.append(Task(
        prompt=f"Reach the {2**(9+i%4)} tile in 2048",
        mcp_config={"hudpython/hud-text-2048:v1.2": {}},
        setup_tool="setup_board",
        evaluate_tool=MCPToolCall(
            name="evaluate_max_tile",
            arguments={"target": 2**(9+i%4)}
        )
    ))

# Save to HuggingFace
save_tasks(tasks, "my-org/2048-training")

Dataset Format

Tasks are stored as HuggingFace datasets with these fields:
{
    "id": "task-001",
    "prompt": "Navigate to login page",
    "mcp_config": '{"hudpython/hud-browser:latest": {}}',  # JSON string
    "setup_tool": '{"name": "clear_cookies"}',              # JSON string
    "evaluate_tool": '{"name": "check_logged_in"}',         # JSON string
    "metadata": '{"difficulty": "easy"}'                    # JSON string
}

Loading Datasets

from hud.datasets import load_tasks

# Load from HuggingFace
tasks = load_tasks("hud-evals/2048-taskset")

# Filter tasks
easy_tasks = [t for t in tasks if t.metadata.get("difficulty") == "easy"]

# Use in training
env = vf.load_environment(
    env_id="hud-vf-gym",
    taskset="hud-evals/2048-taskset",
    config_path="./configs/2048.yaml",
    num_tasks=50  # Sample 50 tasks
)

Curriculum Learning

Create staged datasets for progressive training:
# Stage 1: Simple moves
save_tasks(create_simple_tasks(), "my-org/2048-stage1")

# Stage 2: Strategic play  
save_tasks(create_medium_tasks(), "my-org/2048-stage2")

# Stage 3: Expert level
save_tasks(create_hard_tasks(), "my-org/2048-stage3")

Best Practices

  • Diversity: Include varied prompts and scenarios
  • Balance: Mix difficulty levels appropriately
  • Size: 100-1000 tasks per dataset typically
  • Validation: Keep 10-20% for evaluation

Example Datasets