Example: Testing Your Web Application
This guide demonstrates how to use HUD to test AI agents interacting with your own web application. We’ll package a simple web app into a Docker container, create a CustomGym
to run it, and then use hud-browser
to perform tasks on this local application.
Goal: Automate testing of a custom web application’s user flows (e.g., login, form submission, feature interaction) using an AI agent.
Concepts Covered:
- Packaging a web application with Docker for testing.
- Defining a
CustomGym
to launch your local web app container.
- Using
hud-browser
to interact with an application running on localhost
within a Docker container.
- Creating a
TaskSet
for testing common user workflows.
- Running an agent (e.g.,
ClaudeAgent
) against these tasks.
- Transitioning to remote execution by pushing your app’s Docker image.
This example combines Custom Environments with the Browser Environment.
Prerequisites
- HUD SDK installed.
- Docker installed and running on your local machine.
- API keys for HUD and your chosen agent (e.g.,
ANTHROPIC_API_KEY
).
1. Your Web Application (Example)
Let’s assume you have a simple web application. For this example, imagine a basic Flask app in a directory ./my_web_app/
:
./my_web_app/app.py
:
from flask import Flask, request, render_template_string
app = Flask(__name__)
HTML_FORM = """
<h1>Login</h1>
<form method="post">
Username: <input type="text" name="username"><br>
Password: <input type="password" name="password"><br>
<input type="submit" value="Login">
</form>
"""
HTML_SUCCESS = "<h1>Welcome, {{username}}!</h1>"
@app.route("/login", methods=["GET", "POST"])
def login():
if request.method == "POST":
username = request.form.get("username")
# In a real app, you'd validate credentials
if username:
return render_template_string(HTML_SUCCESS, username=username)
return render_template_string(HTML_FORM)
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
./my_web_app/Dockerfile
:
FROM python:3.10-slim
WORKDIR /app
COPY . /app
RUN pip install Flask
EXPOSE 5000
CMD ["python", "app.py"]
This app has a simple login page at /login
.
2. Define CustomGym
for Your Web App
This CustomGym
tells HUD to build and run your web app’s Docker container locally.
from hud.types import CustomGym
from pathlib import Path
my_webapp_gym = CustomGym(
location="local",
image_or_build_context=Path("./my_web_app") # Points to the directory with Dockerfile
)
When gym.make()
is called with a task using this my_webapp_gym
, HUD will:
- Build the Docker image from
./my_web_app/Dockerfile
(if not already built with this context).
- Run the container, exposing port 5000.
3. Create Tasks to Test Your Web App
We’ll use gym="hud-browser"
for these tasks. The hud-browser
will then be instructed to navigate to http://localhost:PORT
where your custom web app container is running. HUD manages the networking so the browser environment can reach services in your CustomGym
container.
import asyncio
from hud import Task, gym, run_job, register_job
from hud.agent import ClaudeAgent # Or your preferred agent
# Task 1: Successful Login
login_success_task = Task(
prompt="Log in to the application with username 'testuser' and password 'password123'.",
gym=my_webapp_gym, # This specifies the custom environment to run
# The actual interaction will be via hud-browser, which can access the CustomGym container.
# We instruct hud-browser via its setup functions.
setup=[
# The hud-browser will navigate to the service exposed by my_webapp_gym.
# HUD maps CustomGym's exposed ports to accessible localhost ports for hud-browser.
# Assuming your app in my_webapp_gym exposes port 5000:
("goto", "http://localhost:5000/login")
],
evaluate=("page_contains", "Welcome, testuser!")
)
# Task 2: Attempt Login with Missing Username (Example of a negative test)
login_fail_task = Task(
prompt="Attempt to log in with no username and password 'password123'. Verify it stays on the login page.",
gym=my_webapp_gym,
setup=[("goto", "http://localhost:5000/login")],
evaluate=[
("page_contains", "<h1>Login</h1>"), # Should remain on login page
("url_contains", "/login")
]
)
Actually, to be more precise and align with current SDK capabilities for this pattern (running a web app in a custom container and accessing it with hud-browser
):
- The
CustomGym
(e.g., my_webapp_gym
) primarily defines the service to be run.
- The
Task
itself would still use gym="hud-browser"
to indicate the agent interacts via a browser.
- The
hud.gym.make
or run_job
needs to be aware of both: the primary interaction gym (hud-browser
) and the service gym (my_webapp_gym
) that needs to be running.
Let’s refine the Task definition to be more standard for this use case, assuming run_job
handles the service CustomGym
provisioning when it’s referenced or that we pass it to gym.make
if that’s the pattern.
# Corrected Task Definition for clarity:
# The CustomGym (my_webapp_gym) will be launched by the job/runner.
# The Task specifies hud-browser for agent interaction.
login_success_task_revised = Task(
prompt="Log in to the application with username 'testuser' and password 'password123'.",
gym="hud-browser", # Agent interacts via browser
# The setup for the browser gym navigates to the localhost port of the custom web app
setup=[("goto", "http://localhost:5000/login")],
evaluate=("page_contains", "Welcome, testuser!"),
# We'd associate `my_webapp_gym` at a higher level, e.g. when calling run_job
# or if gym.make could take a list of services to ensure are running.
# For now, assume the environment context manages running `my_webapp_gym` alongside `hud-browser`.
)
This aspect of running a service CustomGym
alongside an interactive CustomGym
like hud-browser
might need more explicit documentation or helper functions in the SDK if not already streamlined. For this example, we’ll proceed assuming the setup
in hud-browser
can reach localhost:5000
where the my_webapp_gym
’s service is exposed by HUD.
4. Run Evaluation with an Agent
Now, use an agent to perform these tasks.
# (Continuing from above)
@register_job("my-web-app-tests")
async def test_my_application():
# Tasks for this job
tasks = [
# Using the more direct approach where CustomGym is on the task:
Task(
prompt="Log in with testuser/password123.",
gym=my_webapp_gym, # CustomGym runs the web app
# The setup and evaluate here are for the hud-browser that will be used to interact
setup=[("goto", "http://localhost:5000/login")],
evaluate=("page_contains", "Welcome, testuser!")
),
Task(
prompt="Attempt login with no username.",
gym=my_webapp_gym,
setup=[("goto", "http://localhost:5000/login")],
evaluate=[
("page_contains", "<h1>Login</h1>"),
("url_contains", "/login")
]
)
]
task_set = TaskSet(name="My Web App Login Flows", tasks=tasks)
print(f"Running tests for: {task_set.name}")
# When run_job sees a Task with a CustomGym, and the agent expects a browser,
# it should handle provisioning the CustomGym and making its ports accessible
# to the hud-browser instance used by the agent.
job_instance = await run_job(
agent_cls=ClaudeAgent,
task_or_taskset=task_set,
job_name="WebApp Login Test Run"
)
print(f"Job for {task_set.name} completed. Job ID: {job_instance.id}")
print(f"View results at: https://app.hud.so/jobs/{job_instance.id}")
analytics = await job_instance.get_analytics()
print(f"Analytics: {analytics}")
# if __name__ == "__main__":
# asyncio.run(test_my_application())
Networking: When gym.make(task_with_custom_gym)
or run_job
processes a task whose gym
field is a CustomGym
that exposes ports (like our web app on port 5000), HUD’s local environment manager maps that container’s port to an accessible port on your localhost
. The hud-browser
environment can then navigate to http://localhost:<mapped_port>/login
to interact with your application.
5. Transitioning to Remote Execution
To run these tests on the HUD platform (e.g., for CI or scaled evaluation):
-
Push your Web App Image: Build and push your ./my_web_app
Docker image to a container registry:
docker build -t yourusername/my_web_app:latest ./my_web_app
docker push yourusername/my_web_app:latest
-
Update CustomGym
for Remote: Change your CustomGym
definition:
my_webapp_gym_remote = CustomGym(
location="remote",
image_or_build_context="yourusername/my_web_app:latest"
)
-
Run Tasks/TaskSets: Use this my_webapp_gym_remote
in your Task
definitions. When run_job
executes these, the HUD platform will pull and run your web app image, making it accessible to the remote hud-browser
instances.
Your tasks would now look like:
remote_login_task = Task(
prompt="Log in with testuser/password123.",
gym=my_webapp_gym_remote, # Specifies the remote service
setup=[("goto", "http://localhost:5000/login")], # Path within the service
evaluate=("page_contains", "Welcome, testuser!")
)
# ... and so on for other tasks.
The localhost:5000
URL in the setup
still works because the HUD remote execution environment handles the networking to ensure the hud-browser
(running remotely) can access the service container (also running remotely) as if it were on its localhost
at the exposed port.
Key Takeaways
- Package your web application using Docker.
- Define a
CustomGym
pointing to your app’s Docker image or build context.
- Create
Task
objects that use gym="hud-browser"
(or this CustomGym
if it includes a browser e.g. novnc_ubuntu
) and have their setup
navigate to http://localhost:PORT_YOUR_APP_EXPOSES
.
- HUD manages the Docker lifecycle and networking to make your local app accessible to the
hud-browser
.
- Easily scale to remote execution by pushing your app’s image and updating the
CustomGym
definition.
This pattern allows for robust, automated testing of your web applications using powerful AI agents in controlled, reproducible environments.
Responses are generated using AI and may contain mistakes.