The included train_2048.py script trains a 3B model on the 2048 game. Training takes ~30 minutes for 100 steps.Qwen-2.5-3B agent training on the text-2048 environment using GRPO
Training uses YAML configs to map agent tools to MCP tools.:
Copy
Ask AI
# From rl/configs/2048.yamlsystem_prompt: | You are an excellent 2048 player. Available moves: left(), right(), up(), down(), done() Return ONLY: <tool>move()</tool>action_mappings: left: _tool: "move" direction: {static: "left"} right: _tool: "move" direction: {static: "right"} # ... up, down similar