Example: Testing Your Web Application

This guide demonstrates how to use HUD to test AI agents interacting with your own web application. We’ll package a simple web app into a Docker container, create a CustomGym to run it, and then use hud-browser to perform tasks on this local application.

Goal: Automate testing of a custom web application’s user flows (e.g., login, form submission, feature interaction) using an AI agent.

Concepts Covered:

  • Packaging a web application with Docker for testing.
  • Defining a CustomGym to launch your local web app container.
  • Using hud-browser to interact with an application running on localhost within a Docker container.
  • Creating a TaskSet for testing common user workflows.
  • Running an agent (e.g., ClaudeAgent) against these tasks.
  • Transitioning to remote execution by pushing your app’s Docker image.

This example combines Custom Environments with the Browser Environment.

Prerequisites

  • HUD SDK installed.
  • Docker installed and running on your local machine.
  • API keys for HUD and your chosen agent (e.g., ANTHROPIC_API_KEY).

1. Your Web Application (Example)

Let’s assume you have a simple web application. For this example, imagine a basic Flask app in a directory ./my_web_app/:

./my_web_app/app.py:

from flask import Flask, request, render_template_string

app = Flask(__name__)

HTML_FORM = """
<h1>Login</h1>
<form method="post">
    Username: <input type="text" name="username"><br>
    Password: <input type="password" name="password"><br>
    <input type="submit" value="Login">
</form>
"""

HTML_SUCCESS = "<h1>Welcome, {{username}}!</h1>"

@app.route("/login", methods=["GET", "POST"])
def login():
    if request.method == "POST":
        username = request.form.get("username")
        # In a real app, you'd validate credentials
        if username:
            return render_template_string(HTML_SUCCESS, username=username)
    return render_template_string(HTML_FORM)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

./my_web_app/Dockerfile:

FROM python:3.10-slim
WORKDIR /app
COPY . /app
RUN pip install Flask
EXPOSE 5000
CMD ["python", "app.py"]

This app has a simple login page at /login.

2. Define CustomGym for Your Web App

This CustomGym tells HUD to build and run your web app’s Docker container locally.

from hud.types import CustomGym
from pathlib import Path

my_webapp_gym = CustomGym(
    location="local",
    image_or_build_context=Path("./my_web_app") # Points to the directory with Dockerfile
)

When gym.make() is called with a task using this my_webapp_gym, HUD will:

  1. Build the Docker image from ./my_web_app/Dockerfile (if not already built with this context).
  2. Run the container, exposing port 5000.

3. Create Tasks to Test Your Web App

We’ll use gym="hud-browser" for these tasks. The hud-browser will then be instructed to navigate to http://localhost:PORT where your custom web app container is running. HUD manages the networking so the browser environment can reach services in your CustomGym container.

import asyncio
from hud import Task, gym, run_job, register_job
from hud.agent import ClaudeAgent # Or your preferred agent

# Task 1: Successful Login
login_success_task = Task(
    prompt="Log in to the application with username 'testuser' and password 'password123'.",
    gym=my_webapp_gym, # This specifies the custom environment to run
    # The actual interaction will be via hud-browser, which can access the CustomGym container.
    # We instruct hud-browser via its setup functions.
    setup=[
        # The hud-browser will navigate to the service exposed by my_webapp_gym.
        # HUD maps CustomGym's exposed ports to accessible localhost ports for hud-browser.
        # Assuming your app in my_webapp_gym exposes port 5000:
        ("goto", "http://localhost:5000/login") 
    ],
    evaluate=("page_contains", "Welcome, testuser!")
)

# Task 2: Attempt Login with Missing Username (Example of a negative test)
login_fail_task = Task(
    prompt="Attempt to log in with no username and password 'password123'. Verify it stays on the login page.",
    gym=my_webapp_gym,
    setup=[("goto", "http://localhost:5000/login")],
    evaluate=[
        ("page_contains", "<h1>Login</h1>"), # Should remain on login page
        ("url_contains", "/login")
    ]
)

Actually, to be more precise and align with current SDK capabilities for this pattern (running a web app in a custom container and accessing it with hud-browser):

  1. The CustomGym (e.g., my_webapp_gym) primarily defines the service to be run.
  2. The Task itself would still use gym="hud-browser" to indicate the agent interacts via a browser.
  3. The hud.gym.make or run_job needs to be aware of both: the primary interaction gym (hud-browser) and the service gym (my_webapp_gym) that needs to be running.

Let’s refine the Task definition to be more standard for this use case, assuming run_job handles the service CustomGym provisioning when it’s referenced or that we pass it to gym.make if that’s the pattern.

# Corrected Task Definition for clarity:
# The CustomGym (my_webapp_gym) will be launched by the job/runner.
# The Task specifies hud-browser for agent interaction.
login_success_task_revised = Task(
    prompt="Log in to the application with username 'testuser' and password 'password123'.",
    gym="hud-browser", # Agent interacts via browser
    # The setup for the browser gym navigates to the localhost port of the custom web app
    setup=[("goto", "http://localhost:5000/login")], 
    evaluate=("page_contains", "Welcome, testuser!"),
    # We'd associate `my_webapp_gym` at a higher level, e.g. when calling run_job
    # or if gym.make could take a list of services to ensure are running.
    # For now, assume the environment context manages running `my_webapp_gym` alongside `hud-browser`.
)

This aspect of running a service CustomGym alongside an interactive CustomGym like hud-browser might need more explicit documentation or helper functions in the SDK if not already streamlined. For this example, we’ll proceed assuming the setup in hud-browser can reach localhost:5000 where the my_webapp_gym’s service is exposed by HUD.

4. Run Evaluation with an Agent

Now, use an agent to perform these tasks.

# (Continuing from above)

@register_job("my-web-app-tests")
async def test_my_application():
    # Tasks for this job
    tasks = [
        # Using the more direct approach where CustomGym is on the task:
        Task(
            prompt="Log in with testuser/password123.",
            gym=my_webapp_gym, # CustomGym runs the web app
            # The setup and evaluate here are for the hud-browser that will be used to interact
            setup=[("goto", "http://localhost:5000/login")],
            evaluate=("page_contains", "Welcome, testuser!")
        ),
        Task(
            prompt="Attempt login with no username.",
            gym=my_webapp_gym,
            setup=[("goto", "http://localhost:5000/login")],
            evaluate=[
                ("page_contains", "<h1>Login</h1>"),
                ("url_contains", "/login")
            ]
        )
    ]
    task_set = TaskSet(name="My Web App Login Flows", tasks=tasks)

    print(f"Running tests for: {task_set.name}")
    # When run_job sees a Task with a CustomGym, and the agent expects a browser,
    # it should handle provisioning the CustomGym and making its ports accessible
    # to the hud-browser instance used by the agent.
    job_instance = await run_job(
        agent_cls=ClaudeAgent,
        task_or_taskset=task_set,
        job_name="WebApp Login Test Run"
    )
    print(f"Job for {task_set.name} completed. Job ID: {job_instance.id}")
    print(f"View results at: https://app.hud.so/jobs/{job_instance.id}")

    analytics = await job_instance.get_analytics()
    print(f"Analytics: {analytics}")

# if __name__ == "__main__":
#     asyncio.run(test_my_application())

Networking: When gym.make(task_with_custom_gym) or run_job processes a task whose gym field is a CustomGym that exposes ports (like our web app on port 5000), HUD’s local environment manager maps that container’s port to an accessible port on your localhost. The hud-browser environment can then navigate to http://localhost:<mapped_port>/login to interact with your application.

5. Transitioning to Remote Execution

To run these tests on the HUD platform (e.g., for CI or scaled evaluation):

  1. Push your Web App Image: Build and push your ./my_web_app Docker image to a container registry:

    docker build -t yourusername/my_web_app:latest ./my_web_app
    docker push yourusername/my_web_app:latest
  2. Update CustomGym for Remote: Change your CustomGym definition:

    my_webapp_gym_remote = CustomGym(
        location="remote",
        image_or_build_context="yourusername/my_web_app:latest"
    )
  3. Run Tasks/TaskSets: Use this my_webapp_gym_remote in your Task definitions. When run_job executes these, the HUD platform will pull and run your web app image, making it accessible to the remote hud-browser instances.

    Your tasks would now look like:

    remote_login_task = Task(
        prompt="Log in with testuser/password123.",
        gym=my_webapp_gym_remote, # Specifies the remote service
        setup=[("goto", "http://localhost:5000/login")], # Path within the service
        evaluate=("page_contains", "Welcome, testuser!")
    )
    # ... and so on for other tasks.

    The localhost:5000 URL in the setup still works because the HUD remote execution environment handles the networking to ensure the hud-browser (running remotely) can access the service container (also running remotely) as if it were on its localhost at the exposed port.

Key Takeaways

  • Package your web application using Docker.
  • Define a CustomGym pointing to your app’s Docker image or build context.
  • Create Task objects that use gym="hud-browser" (or this CustomGym if it includes a browser e.g. novnc_ubuntu) and have their setup navigate to http://localhost:PORT_YOUR_APP_EXPOSES.
  • HUD manages the Docker lifecycle and networking to make your local app accessible to the hud-browser.
  • Easily scale to remote execution by pushing your app’s image and updating the CustomGym definition.

This pattern allows for robust, automated testing of your web applications using powerful AI agents in controlled, reproducible environments.