Website testing
Evaluate AI agents interacting with your own web applications using HUD and Docker.
Example: Testing Your Web Application
This guide demonstrates how to use HUD to test AI agents interacting with your own web application. We’ll package a simple web app into a Docker container, create a CustomGym
to run it, and then use hud-browser
to perform tasks on this local application.
Goal: Automate testing of a custom web application’s user flows (e.g., login, form submission, feature interaction) using an AI agent.
Concepts Covered:
- Packaging a web application with Docker for testing.
- Defining a
CustomGym
to launch your local web app container. - Using
hud-browser
to interact with an application running onlocalhost
within a Docker container. - Creating a
TaskSet
for testing common user workflows. - Running an agent (e.g.,
ClaudeAgent
) against these tasks. - Transitioning to remote execution by pushing your app’s Docker image.
This example combines Custom Environments with the Browser Environment.
Prerequisites
- HUD SDK installed.
- Docker installed and running on your local machine.
- API keys for HUD and your chosen agent (e.g.,
ANTHROPIC_API_KEY
).
1. Your Web Application (Example)
Let’s assume you have a simple web application. For this example, imagine a basic Flask app in a directory ./my_web_app/
:
./my_web_app/app.py
:
./my_web_app/Dockerfile
:
This app has a simple login page at /login
.
2. Define CustomGym
for Your Web App
This CustomGym
tells HUD to build and run your web app’s Docker container locally.
When gym.make()
is called with a task using this my_webapp_gym
, HUD will:
- Build the Docker image from
./my_web_app/Dockerfile
(if not already built with this context). - Run the container, exposing port 5000.
3. Create Tasks to Test Your Web App
We’ll use gym="hud-browser"
for these tasks. The hud-browser
will then be instructed to navigate to http://localhost:PORT
where your custom web app container is running. HUD manages the networking so the browser environment can reach services in your CustomGym
container.
Actually, to be more precise and align with current SDK capabilities for this pattern (running a web app in a custom container and accessing it with hud-browser
):
- The
CustomGym
(e.g.,my_webapp_gym
) primarily defines the service to be run. - The
Task
itself would still usegym="hud-browser"
to indicate the agent interacts via a browser. - The
hud.gym.make
orrun_job
needs to be aware of both: the primary interaction gym (hud-browser
) and the service gym (my_webapp_gym
) that needs to be running.
Let’s refine the Task definition to be more standard for this use case, assuming run_job
handles the service CustomGym
provisioning when it’s referenced or that we pass it to gym.make
if that’s the pattern.
This aspect of running a service CustomGym
alongside an interactive CustomGym
like hud-browser
might need more explicit documentation or helper functions in the SDK if not already streamlined. For this example, we’ll proceed assuming the setup
in hud-browser
can reach localhost:5000
where the my_webapp_gym
’s service is exposed by HUD.
4. Run Evaluation with an Agent
Now, use an agent to perform these tasks.
Networking: When gym.make(task_with_custom_gym)
or run_job
processes a task whose gym
field is a CustomGym
that exposes ports (like our web app on port 5000), HUD’s local environment manager maps that container’s port to an accessible port on your localhost
. The hud-browser
environment can then navigate to http://localhost:<mapped_port>/login
to interact with your application.
5. Transitioning to Remote Execution
To run these tests on the HUD platform (e.g., for CI or scaled evaluation):
-
Push your Web App Image: Build and push your
./my_web_app
Docker image to a container registry: -
Update
CustomGym
for Remote: Change yourCustomGym
definition: -
Run Tasks/TaskSets: Use this
my_webapp_gym_remote
in yourTask
definitions. Whenrun_job
executes these, the HUD platform will pull and run your web app image, making it accessible to the remotehud-browser
instances.Your tasks would now look like:
The
localhost:5000
URL in thesetup
still works because the HUD remote execution environment handles the networking to ensure thehud-browser
(running remotely) can access the service container (also running remotely) as if it were on itslocalhost
at the exposed port.
Key Takeaways
- Package your web application using Docker.
- Define a
CustomGym
pointing to your app’s Docker image or build context. - Create
Task
objects that usegym="hud-browser"
(or thisCustomGym
if it includes a browser e.g.novnc_ubuntu
) and have theirsetup
navigate tohttp://localhost:PORT_YOUR_APP_EXPOSES
. - HUD manages the Docker lifecycle and networking to make your local app accessible to the
hud-browser
. - Easily scale to remote execution by pushing your app’s image and updating the
CustomGym
definition.
This pattern allows for robust, automated testing of your web applications using powerful AI agents in controlled, reproducible environments.