Using the hud-browser for web-based agent evaluation
hud-browser
)Primary Environment
The hud-browser
is the default and most commonly used environment in HUD. Most examples and TaskSets use this environment for web-based agent evaluation.
The hud-browser
environment provides a remote Chromium browser instance for agents to interact with websites. It’s ideal for web navigation, form filling, information retrieval, testing web applications, and question answering with web research.
View the browser session in real-time during agent interaction:
For advanced control or integration with tools like browser_use
:
These functions configure the browser’s state via Task.setup
before the agent starts:
Function | Description |
---|---|
goto(url) | Navigates to a URL. |
load_html_content(html) | Loads static HTML content. |
sheets_from_xlsx(url, name?) | Converts an XLSX from URL to a new Google Sheet and opens it. |
Agents interact with the browser using CLA Actions passed to env.step()
:
ClickAction
, ScrollAction
, DragAction
, MoveAction
TypeAction
, PressAction
ResponseAction
(for submitting final answers)These functions, used in Task.evaluate
, verify task completion within the browser:
Category | Function | Description |
---|---|---|
Page Content | page_contains(list[str]) | Text exists on page. |
element_exists(selector) | Element is present. | |
text_matches(sel, pattern) | Element text matches regex. | |
URL & Navigation | url_contains(substring) | URL contains substring. |
url_match(expected_url) | Exact URL match. | |
Browser State | cookie_exists(list[str]) | Required cookies present. |
Agent Response | response_includes(text) | Agent’s final answer contains text. |
Action History | selector_history(sel, idx?) | Selector was interacted with (default: last). |
verify_type_action(sel, val) | Last action was typing val into sel . | |
history_length(len_spec) | Agent actions count matches len_spec (int or dict). | |
raw_last_action_is(dict) | Last raw agent action matches dict. | |
Spreadsheets | sheets_cell_values(map) | Google Sheet cells match {"A1": "Val"} . |
wait_for_element
in your agent’s logic or ensure setup is complete.env.evaluate()
after steps.await env.stream()
for real-time debugging.