Page Cloning

This guide demonstrates how to create and host web archives for testing AI agents with consistent, offline-first environments. By cloning websites into WACZ (Web ARChiveZip) files, you can ensure your agents always test against specific, unchanging versions of web pages. Goal: Create reproducible web environments for testing browser-based agents without depending on live websites that might change or go offline. Concepts Covered:

Using ArchiveWeb.page to clone websites into WACZ files
Hosting archives locally with the HUD page archives repository and CustomGym
Uploading archives to app.hud.so for immediate cloud hosting
Creating tasks that use these stable archived environments

Prerequisites

HUD SDK installed
Docker installed (for local hosting option)
ArchiveWeb.page browser extension (for cloning pages)
API keys for HUD and your chosen agent

Part 1: Cloning the Page

Installing ArchiveWeb.page

Install the Browser Extension:
- Visit ArchiveWeb.page
- Install the extension for Chrome/Chromium-based browsers
- The extension icon will appear in your browser toolbar
Create a New Archive:
- Click the ArchiveWeb.page extension icon
- Click “Create New Collection”
- Give your collection a descriptive name (e.g., “my-test-site”)

Capturing Web Pages

Start Archiving:
- Click “Start” in the extension popup to begin an archiving session
- Navigate to the website you want to clone
- Interact with the site as your agent would (login, navigate through pages, fill forms)
- All pages and resources will be captured automatically
Best Practices for Agent Testing:
- Capture all relevant pages and states your agent will interact with
- Include error pages and edge cases
- If testing login flows, capture both logged-out and logged-in states
- For form submissions, capture the form page and success/error pages
Stop and Download:
- Click “Stop” in the extension when done capturing
- Click “Download” to save your collection
- Choose WACZ format (default)
- Save with a meaningful filename (e.g., my-test-site.wacz)

1. Start archiving session
2. Visit https://example.com/login
3. Enter test credentials (e.g., testuser/password123)
4. Submit the form
5. Capture the dashboard/welcome page
6. Optionally capture logout flow
7. Stop and download as my-test-site.wacz

Part 2: Hosting the Website

You have two options for hosting your archived website:

Option 1: Local Hosting with CustomGym

This approach uses the HUD page archives repository to host archives locally and access them via CustomGym.

Step 1: Clone the Page Archives Repository

git clone https://github.com/hud-evals/page-archives.git
cd page-archives

Step 2: Add Your Archive

Place your WACZ file:

cp ~/Downloads/my-test-site.wacz archives/

Update archives/archive_list.json:

{
  "archives": [
    {
      "name": "my-test-site",
      "displayName": "My Test Site Archive",
      "startPage": "https://example.com/login"  // Optional: default page to open
    }
    // ... other archives
  ]
}

Note: The name field must match your WACZ filename without the .wacz extension.

Step 3: Create a CustomGym for the Archive Server

from hud.types import CustomGym
from pathlib import Path

# Create a Dockerfile for the archive server
archive_server_dockerfile = """
FROM node:18-slim
WORKDIR /app
COPY . /app
RUN npm install
EXPOSE 3000
CMD ["npm", "run", "start"]
"""

# Save Dockerfile in the page-archives directory
with open("page-archives/Dockerfile", "w") as f:
    f.write(archive_server_dockerfile)

# Define the CustomGym
archive_server_gym = CustomGym(
    location="local",
    image_or_build_context=Path("./page-archives"),
    host_config={
        "port_bindings": {3000: 3000}  # Expose port 3000
    }
)

Step 4: Create Tasks Using the Archived Site

from hud import Task, run_job
from hud.agent import ClaudeAgent

# Task to test login flow on the archived site
login_task = Task(
    prompt="Log into the website using username 'testuser' and password 'password123'.",
    gym="hud-browser",  # Use browser to interact
    setup=[
        # Navigate to your archived site running locally
        ("goto", "http://localhost:3000/my-test-site")
    ],
    evaluate=("page_contains", "Welcome, testuser!")
)

Advanced: Query Parameters

The archive viewer supports useful query parameters:

# Open a specific page within the archive
specific_page_task = Task(
    prompt="Navigate to the user profile page",
    gym="hud-browser",
    setup=[
        ("goto", "http://localhost:3000/my-test-site?page=https%3A%2F%2Fexample.com%2Fprofile")
    ]
)

# Debug mode - shows full ReplayWeb.page UI
debug_task = Task(
    prompt="Explore the archive interface",
    gym="hud-browser",
    setup=[
        ("goto", "http://localhost:3000/my-test-site?debug=true")
    ]
)

Option 2: Cloud Hosting on app.hud.so

For immediate hosting without local setup, use the HUD platform’s built-in page cloning feature.

Step 1: Access Page Clone Feature

Go to app.hud.so
Click “Create” in the navigation
Select “Page Clone”

Step 2: Upload Your Archive

Click “Upload WACZ file”
Select your .wacz file created in Part 1
Provide a name for your cloned environment
Click “Create”

Step 3: Use the Hosted Archive

Once uploaded, you’ll receive a URL for your hosted archive (e.g., https://archives.hud.so/your-archive-id).

from hud import Task, run_job
from hud.agent import ClaudeAgent

# Task using the cloud-hosted archive
cloud_login_task = Task(
    prompt="Log into the website using username 'testuser' and password 'password123'.",
    gym="hud-browser",
    setup=[
        # Navigate to your cloud-hosted archive
        ("goto", "https://archives.hud.so/your-archive-id")
    ],
    evaluate=("page_contains", "Welcome, testuser!")
)

# Run evaluation
job = await run_job(
    agent_cls=ClaudeAgent,
    task_or_taskset=cloud_login_task,
    job_name="Cloud Archive Test"
)

Tips for Effective Page Cloning

Capture Complete Flows: Don’t just capture individual pages - capture entire user journeys
Include Resources: Ensure CSS, JavaScript, and images are properly captured
Test Your Archives: Always verify your archives work correctly before using them in evaluations
Document States: Keep notes on what states and pages are included in each archive
Update Regularly: Re-clone sites when significant changes occur

Key Takeaways

ArchiveWeb.page makes it easy to create WACZ archives of any website
Local hosting with CustomGym gives you full control and fast performance
Cloud hosting on app.hud.so provides instant deployment without infrastructure
Page cloning ensures consistent, reproducible testing environments for AI agents
Archived sites eliminate external dependencies and enable offline testing

Getting Started

Examples

Features

Concepts

Environments

Web Mocks

Page Cloning

Prerequisites

Part 1: Cloning the Page

Installing ArchiveWeb.page

Capturing Web Pages

Part 2: Hosting the Website

Option 1: Local Hosting with CustomGym

Step 1: Clone the Page Archives Repository

Step 2: Add Your Archive

Step 3: Create a CustomGym for the Archive Server

Step 4: Create Tasks Using the Archived Site

Advanced: Query Parameters

Option 2: Cloud Hosting on app.hud.so

Step 1: Access Page Clone Feature

Step 2: Upload Your Archive

Step 3: Use the Hosted Archive

Tips for Effective Page Cloning

Key Takeaways

Getting Started

Examples

Features

Concepts

Environments

​Page Cloning

​Prerequisites

​Part 1: Cloning the Page

​Installing ArchiveWeb.page

​Capturing Web Pages

​Example: Cloning a Login Flow

​Part 2: Hosting the Website

​Option 1: Local Hosting with CustomGym

​Step 1: Clone the Page Archives Repository

​Step 2: Add Your Archive

​Step 3: Create a CustomGym for the Archive Server

​Step 4: Create Tasks Using the Archived Site

​Advanced: Query Parameters

​Option 2: Cloud Hosting on app.hud.so

​Step 1: Access Page Clone Feature

​Step 2: Upload Your Archive

​Step 3: Use the Hosted Archive

​Tips for Effective Page Cloning

​Key Takeaways

Page Cloning

Prerequisites

Part 1: Cloning the Page

Installing ArchiveWeb.page

Capturing Web Pages

Example: Cloning a Login Flow

Part 2: Hosting the Website

Option 1: Local Hosting with CustomGym

Step 1: Clone the Page Archives Repository

Step 2: Add Your Archive

Step 3: Create a CustomGym for the Archive Server

Step 4: Create Tasks Using the Archived Site

Advanced: Query Parameters

Option 2: Cloud Hosting on app.hud.so

Step 1: Access Page Clone Feature

Step 2: Upload Your Archive

Step 3: Use the Hosted Archive

Tips for Effective Page Cloning

Key Takeaways