Adapter
Understanding Adapters for action and observation translation
Adapter
An Adapter
acts as a translator between the specific format used by an Agent’s underlying model (like Claude or OpenAI Computer Use) and the standardized formats used by the HUD Environment.
Overview
Different AI models might expect observations (like screenshots) in specific dimensions or return actions in unique structures. The Adapter
handles these differences:
- Observation Preprocessing: It can resize or reformat environment observations (primarily screenshots) before they are sent to the agent via the
rescale()
method. - Action Postprocessing: It converts the raw actions received from the agent’s model into the standardized
CLA
(Command Language Action) format that the environment understands via theadapt()
oradapt_list()
methods. This includes mapping action names (e.g., mappingkey
topress
) and rescaling coordinates from the agent’s expected dimensions to the actual environment dimensions.
Base Adapter Class (hud.adapters.Adapter
)
The base Adapter
class (hud.adapters.common.adapter.Adapter
) defines the interface and common logic:
__init__()
: Initializes default agent/environment dimensions (often overridden by subclasses).rescale(observation)
: Resizes an image observation (numpy array, PIL Image, or base64 string) to theagent_width
andagent_height
defined in the adapter. Returns the rescaled image as a base64 PNG string.convert(action)
: Abstract method (though often implemented in subclasses likeClaudeAdapter
orOperatorAdapter
) intended to translate a single raw action from the model into theCLA
format. Note: The main conversion logic usually happens withinadapt
oradapt_list
in practice.postprocess_action(action_dict)
: Rescales coordinates within an action dictionary (already converted to a dict from CLA) from agent dimensions to environment dimensions.adapt(action)
: The primary method called by the baseAgent
’spostprocess
stage. It typically callsconvert
to get a CLA object, thenjson
to turn it into a dict,postprocess_action
to rescale coordinates in the dict, and finally validates the rescaled dict back into a CLA object.adapt_list(actions)
: A helper method that appliesadapt
to each action in a list.
Built-in Adapters
The SDK provides adapters for the built-in agents:
hud.adapters.claude.ClaudeAdapter
: Translates actions from the format used by Anthropic’s Claude Computer Use API intoCLA
. Handles coordinate scaling based on Claude’s default dimensions (1024x768, unless overridden).hud.adapters.operator.OperatorAdapter
: Translates actions from the format used by OpenAI’s Computer Use Preview API intoCLA
. Handles coordinate scaling based on OpenAI’s default dimensions (1024x768, unless overridden).
Using Adapters
Adapters are typically passed during the initialization of an Agent
. If no adapter is provided to a built-in agent like ClaudeAgent
or OperatorAgent
, a default instance of the corresponding adapter (ClaudeAdapter
or OperatorAdapter
) is usually created internally.
Custom Adapters
If you create a custom agent that uses a unique action format or requires specific observation preprocessing, you can create a custom adapter by inheriting from hud.adapters.Adapter
and implementing the necessary logic, primarily overriding the convert
method or potentially rescale
and postprocess_action
if needed.
Related Concepts
- Agent: Uses the Adapter to preprocess observations and postprocess actions.
- Environment: Consumes standardized
CLA
actions processed by the Adapter. - Command Language Actions (CLA): The standardized action format used by HUD environments.