Custom
Leverage Docker to create specialized evaluation environments for any scenario.
Overview: Custom Environments
HUD empowers you to go beyond pre-defined environments like hud-browser
by creating and using Custom Environments. These are Docker-based setups, giving you unparalleled control over the software stack, applications, and specific state needed for your agent evaluation scenarios.
Whether you’re testing on your own web applications, complex desktop software, or unique backends, custom environments provide the flexibility and reproducibility required for robust agent evaluation.
Why Use Custom Environments?
- Test Proprietary Systems: Evaluate agents interacting with your internal web apps, desktop software (via VNC setups), or custom command-line tools within a controlled, containerized environment.
- Specialized Setups: Create environments with specific operating system versions, pre-installed dependencies, databases, particular file structures, or unique network configurations that standard environments don’t offer.
- Full Reproducibility: Package your entire environment with Docker to ensure consistent testing conditions across different machines and over time.
- Complex Workflows: Model tasks that involve multiple applications, system-level interactions, or require specific background services to be running.
- Community & Collaboration: Share your specialized environments with the HUD community, or leverage environments contributed by others to expand your testing capabilities.
Two Main Approaches to Custom Environments:
-
Containerized Web Applications (Accessed via
hud-browser
):- What it is: You package your web application (e.g., a Flask/Django app, a static site, a Node.js service) into a Docker container.
- How it works: Define a
CustomGym
that points to this Docker image/build context. In yourTask
, you still specifygym="hud-browser"
. HUD then runs your web app container and makes its exposed ports accessible to thehud-browser
instance (typically viahttp://localhost:PORT_FROM_YOUR_APP
). The agent interacts through the browser as usual. - Use Cases: Testing your own web apps, staging environments, specific website versions.
- See the Web App Testing Example →
-
Controller-Based Custom Environments (Full Control):
- What it is: You create a Docker image that includes a special Python package named
hud_controller
. This package contains your custom Python functions forTask.setup
andTask.evaluate
that run inside the Docker container. - How it works: Define a
CustomGym
pointing to this Docker image/build context. In yourTask
, you setgym=your_custom_gym_object
. Thesetup
andevaluate
functions in your task configuration will directly call the corresponding Python functions in yourhud_controller
. - Use Cases: OS-level tasks, file system manipulation, testing desktop GUI applications (often with VNC and tools like
pyautogui
inside the controller), interacting with games, or any scenario requiring direct execution of Python logic within the controlled environment. - See the Custom OS Environment Example →
- What it is: You create a Docker image that includes a special Python package named
Defining a CustomGym
Regardless of the approach, custom environments are specified using the CustomGym
object:
Key CustomGym
Parameters:
location
("local"
|"remote"
): Where the Docker container runs."local"
: Builds (ifimage_or_build_context
is a Path) and runs on your machine. Requires Docker."remote"
: Runs on the HUD platform. Ifimage_or_build_context
is a Path, HUD will build it remotely. If it’s an image name, HUD pulls it from a registry.
image_or_build_context
(str |Path
):str
: A Docker image name (e.g.,"nginx:latest"
).Path
: A local directory containing aDockerfile
(and for controller-based environments, ahud_controller
package).
Next Steps: Building and Contributing
Ready to create your own custom environment or contribute to the HUD ecosystem?
Environment Creation Guide
A step-by-step walkthrough of building, testing, and structuring custom Docker environments, including those with Python controllers.
Open Source Environments
Explore examples like novnc_ubuntu
, custom_website
templates, and other community contributions in our GitHub repository.
Related Documentation
- Task Creation: Learn how to define tasks that utilize your custom environments.
- Browser Environment: For standard web interaction needs, or as the interface to your containerized web app.
- Environment Creation & Contribution Guide: The complete guide to building and sharing environments.