趣味性自主机器人学习

摘要

当前自主机器人系统能够编写可执行的“代码即策略”程序、观察反馈信息并在多次尝试中修正行为，但这些系统仍以任务驱动为主：只有在明确指令下达后，才能习得可复用的技能。我们研究了一种名为“玩耍式自主机器人学习”的方法，其中具身编码智能体将自我引导的“玩耍”作为下游任务到来之前的持续技能学习阶段。我们提出了RATs（机器人智能体团队），专为玩耍式技能习得而设计。在玩耍过程中，RATs会提出新颖且可学习的探索性任务，规划并执行机器人代码策略，验证中间进展，诊断失败原因，利用密集的步骤级反馈进行重试，并将成功执行的过程提炼为持久化的代码技能库。在测试阶段，智能体从该冻结库中复用相关技能，以帮助解决新任务。在LIBERO-PRO和MolmoSpaces上的实验表明，与无玩耍和随机玩耍基线相比，通过玩耍习得的技能在留出下游任务上分别提升了20.6和17.0个百分点（相较于CaP-Agent0）。此外，这些习得的技能只需通过检索放入上下文，即可嵌入其他推理阶段的“代码即策略”智能体中，在不微调底层模型的情况下，分别使RoboSuite和真实世界迁移任务的性能提升了8.9和8.8个百分点。

English

Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.