趣味性自主机器人学习
Playful Agentic Robot Learning
June 17, 2026
作者: Junyi Zhang, Jiaxin Ge, Hanjun Yoo, Letian Fu, Zihan Yang, Yaowei Liu, Raj Saravanan, Shaofeng Yin, Justin Yu, Dantong Niu, Zirui Wang, Roei Herzig, Ken Goldberg, Yutong Bai, David M. Chan, Ion Stoica, Angjoo Kanazawa, Jiahui Lei, Haiwen Feng, Trevor Darrell
cs.AI
摘要
当前自主机器人系统能够编写可执行的“代码即策略”程序、观察反馈信息并在多次尝试中修正行为,但这些系统仍以任务驱动为主:只有在明确指令下达后,才能习得可复用的技能。我们研究了一种名为“玩耍式自主机器人学习”的方法,其中具身编码智能体将自我引导的“玩耍”作为下游任务到来之前的持续技能学习阶段。我们提出了RATs(机器人智能体团队),专为玩耍式技能习得而设计。在玩耍过程中,RATs会提出新颖且可学习的探索性任务,规划并执行机器人代码策略,验证中间进展,诊断失败原因,利用密集的步骤级反馈进行重试,并将成功执行的过程提炼为持久化的代码技能库。在测试阶段,智能体从该冻结库中复用相关技能,以帮助解决新任务。在LIBERO-PRO和MolmoSpaces上的实验表明,与无玩耍和随机玩耍基线相比,通过玩耍习得的技能在留出下游任务上分别提升了20.6和17.0个百分点(相较于CaP-Agent0)。此外,这些习得的技能只需通过检索放入上下文,即可嵌入其他推理阶段的“代码即策略”智能体中,在不微调底层模型的情况下,分别使RoboSuite和真实世界迁移任务的性能提升了8.9和8.8个百分点。
English
Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.