Adapter

An Adapter acts as a translator between the specific format used by an Agent’s underlying model (like Claude or OpenAI Computer Use) and the standardized formats used by the HUD Environment.

Overview

Different AI models might expect observations (like screenshots) in specific dimensions or return actions in unique structures. The Adapter handles these differences:

  1. Observation Preprocessing: It can resize or reformat environment observations (primarily screenshots) before they are sent to the agent via the rescale() method.
  2. Action Postprocessing: It converts the raw actions received from the agent’s model into the standardized CLA (Command Language Action) format that the environment understands via the adapt() or adapt_list() methods. This includes mapping action names (e.g., mapping key to press) and rescaling coordinates from the agent’s expected dimensions to the actual environment dimensions.

Base Adapter Class (hud.adapters.Adapter)

The base Adapter class (hud.adapters.common.adapter.Adapter) defines the interface and common logic:

  • __init__(): Initializes default agent/environment dimensions (often overridden by subclasses).
  • rescale(observation): Resizes an image observation (numpy array, PIL Image, or base64 string) to the agent_width and agent_height defined in the adapter. Returns the rescaled image as a base64 PNG string.
  • convert(action): Abstract method (though often implemented in subclasses like ClaudeAdapter or OperatorAdapter) intended to translate a single raw action from the model into the CLA format. Note: The main conversion logic usually happens within adapt or adapt_list in practice.
  • postprocess_action(action_dict): Rescales coordinates within an action dictionary (already converted to a dict from CLA) from agent dimensions to environment dimensions.
  • adapt(action): The primary method called by the base Agent’s postprocess stage. It typically calls convert to get a CLA object, then json to turn it into a dict, postprocess_action to rescale coordinates in the dict, and finally validates the rescaled dict back into a CLA object.
  • adapt_list(actions): A helper method that applies adapt to each action in a list.

Built-in Adapters

The SDK provides adapters for the built-in agents:

  • hud.adapters.claude.ClaudeAdapter: Translates actions from the format used by Anthropic’s Claude Computer Use API into CLA. Handles coordinate scaling based on Claude’s default dimensions (1024x768, unless overridden).
  • hud.adapters.operator.OperatorAdapter: Translates actions from the format used by OpenAI’s Computer Use Preview API into CLA. Handles coordinate scaling based on OpenAI’s default dimensions (1024x768, unless overridden).

Using Adapters

Adapters are typically passed during the initialization of an Agent. If no adapter is provided to a built-in agent like ClaudeAgent or OperatorAgent, a default instance of the corresponding adapter (ClaudeAdapter or OperatorAdapter) is usually created internally.

from hud.agent import OperatorAgent
from hud.adapters.operator import OperatorAdapter

# Create an adapter instance (optional, agent creates default if None)
# You might customize adapter parameters here if needed in the future
my_adapter = OperatorAdapter()

# Pass the adapter to the agent
agent = OperatorAgent(environment="browser", adapter=my_adapter)

# Now, when agent.predict() is called, it will use my_adapter
# for preprocessing (rescale) and postprocessing (adapt_list).

Custom Adapters

If you create a custom agent that uses a unique action format or requires specific observation preprocessing, you can create a custom adapter by inheriting from hud.adapters.Adapter and implementing the necessary logic, primarily overriding the convert method or potentially rescale and postprocess_action if needed.

  • Agent: Uses the Adapter to preprocess observations and postprocess actions.
  • Environment: Consumes standardized CLA actions processed by the Adapter.
  • Command Language Actions (CLA): The standardized action format used by HUD environments.