Environment

An Environment in the HUD SDK represents the runtime instance (like a web browser or an operating system desktop) where an Agent interacts to complete a Task.

Overview

The Environment is the core component your agent interacts with. It provides:

  • A way to send actions (like clicks, typing) in the standardized CLA format (see Adapters) to the runtime.
  • A way to receive Observation objects (like screenshots, text) from the runtime.
  • Methods for managing the environment’s lifecycle (close) and evaluating task success (evaluate).
  • Automatic recording of interactions into a Trajectory when linked to a Job.

Creating an Environment

Environments are created using hud.gym.make(). You typically pass a Task object, which defines the required environment type (gym attribute), setup actions, and evaluation logic.

from hud import gym
from hud.task import Task

# Task defines the environment type (gym="hud-browser")
task = Task(
    prompt="Search Google for capybaras",
    gym="hud-browser",
    setup=("goto", "google.com")
)

# gym.make reads the task.gym attribute to create the right environment
# It also runs the task.setup actions internally.
env = await gym.make(task)

print("Browser environment created and setup complete.")
# await env.close()

You can also create environments directly using known Gym IDs:

env_os = await gym.make("OSWorld-Ubuntu")
# await env_os.close()

Environments created this way won’t have a default Task associated unless you explicitly reset them with one later using env.reset(). The gym.make() function also automatically links the environment to an active Job if one was defined using the @register_job decorator.

Available Environment Types

The HUD SDK provides several standard environment types, specified via the gym attribute in a Task or directly in hud.gym.make():

  • "hud-browser": Provides a remote Chromium browser instance managed via Playwright. Ideal for web navigation, form interaction, and testing web applications.
  • "hud-ubuntu": Provides a remote Ubuntu desktop environment accessed via VNC. Suitable for tasks involving GUI applications, file system interaction, or running Linux software.
  • "qa": A non-interactive environment for question-answering tasks where the agent provides a direct textual response.
  • CustomGym: Allows defining and running your own Custom Environments using Docker, either locally or remotely. This provides maximum flexibility for specific testing needs.

The gym attribute in a Task tells hud.gym.make() which environment to instantiate.

Interaction Loop

The standard interaction flow involves the Agent and the Environment:

  1. Get Initial Observation: Call env.step() without actions after gym.make(). This returns the starting Observation.
  2. Agent Prediction: Your agent’s predict(observation) method processes the observation and decides on the next action(s).
  3. Execute Action: Call env.step(actions) with the agent’s action(s) (which should be in CLA format, usually handled by an Adapter). This executes the actions and returns the next Observation, reward, terminated flag, and info dict.
  4. Repeat: The agent continues predicting and stepping until the task is complete or the environment terminates.
# Assume 'env' and 'agent' are initialized

# 1. Get initial observation
obs, _ = await env.reset()

# 4. Repeat loop (Agent predicts, Env steps)
for _ in range(10):
    # 2. Agent predicts action(s)
    actions, done = await agent.predict(obs)

    # 3. Execute action(s) in environment
    obs, reward, terminated, info = await env.step(actions)
    if done or terminated: break
  • Note on QA Tasks: For Question-Answering Tasks, the agent might only need one predict call. The agent should output a ResponseAction, which the environment stores. The subsequent env.evaluate() call then checks this stored response. The environment itself remains largely passive for QA.

Key Methods

  • env.step(actions: list[CLA] | None = None): Executes actions (or gets initial state). Returns (Observation, reward, terminated, info).
  • env.evaluate(config: FunctionConfigs | None = None): Runs evaluation logic defined in the Task (or the provided config). Returns evaluation result.
  • env.close(): Shuts down the environment. Saves the Trajectory if linked to a Job.
  • env.get_urls(): Returns URLs (url, live_url) for accessing/viewing the environment.
  • env.reset(configs: FunctionConfigs | None = None): Resets state, often running setup steps. Mostly used internally or for environments created without an initial Task.
  • env._setup(...) / env._invoke_all(...): Internal methods for running setup/evaluate/custom configurations defined in a Task.

Observations

The Observation object returned by env.step() contains:

  • screenshot (str | None): Base64-encoded PNG string.
  • text (str | None): Optional text output (e.g., terminal, accessibility tree).
  • Task: Defines the environment type (gym), setup, and evaluate logic.
  • Agent: Interacts with the Environment via the step and predict methods.
  • Adapter: Ensures actions passed to step are in the correct CLA format.
  • Job: Groups environment runs; linking happens via @register_job or gym.make(job=...).
  • Trajectory: The recording generated when a job-linked environment is closed.