Train agents to perform better on specific tasks using reinforcement learning. Small models trained on your tasks can outperform larger general models.

Overview

HUD integrates with Verifiers for GRPO (Group Relative Policy Optimization) training:
import verifiers as vf

# Load environment
env = vf.load_environment(
    env_id="hud-vf-gym",
    taskset="hud-evals/2048-taskset",
    config_path="configs/2048.yaml"
)

# Train with GRPO
model, tokenizer = vf.get_model_and_tokenizer("Qwen/Qwen2.5-3B-Instruct")
trainer = vf.GRPOTrainer(
    model=model,
    env=env,
    args=vf.grpo_defaults(),
    peft_config=vf.lora_defaults()  # LoRA for efficiency
)
trainer.train()

Why Train?

  • Task Performance: 3B model can reach 80% success vs 35% baseline
  • Cost: Smaller models are cheaper to run
  • Speed: Trained models complete tasks faster
  • Consistency: Less variance in behavior

Requirements

Monitoring & Analytics

Track training progress in real-time:
  • app.hud.so - View traces, metrics, and rewards for each training generation
  • Weights & Biases - Detailed ML metrics and loss curves
  • Local logs - Training checkpoints and evaluation results

Getting Started