扩展与精炼：基于语言引导的机器人技能获取

摘要

我们提出了一个机器人技能习得框架，该框架旨在：1）高效扩展语言标记机器人数据的生成，并且2）有效地将这些数据提炼成稳健的多任务语言条件的视觉-运动策略。对于（1），我们使用大型语言模型（LLM）来引导高层规划，并使用基于采样的机器人规划器（例如运动或抓取采样器）来生成多样且丰富的操作轨迹。为了增强数据收集过程的稳健性，LLM还推断出每个任务的成功条件的代码片段，同时使数据收集过程能够检测失败并重试，以及自动标记成功/失败的轨迹。对于（2），我们将扩散策略单任务行为克隆方法扩展到具有语言条件的多任务设置。最后，我们提出了一个新的多任务基准，涵盖五个领域的18个任务，用于测试长期行为、常识推理、工具使用和直觉物理学。我们发现，我们提炼的策略成功地学习了其数据收集策略中的稳健重试行为，同时在五个领域中平均提高了34.8%的绝对成功率。基准测试、代码和定性结果可在我们的网站https://www.cs.columbia.edu/~huy/scalingup/ 上找到。

English

We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection policy, while improving absolute success rates by 34.8% on average across five domains. The benchmark, code, and qualitative results are on our website https://www.cs.columbia.edu/~huy/scalingup/

扩展与精炼：基于语言引导的机器人技能获取

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

摘要

Support