擴展與精煉：語言引導的機器人技能習得

摘要

我們提出了一個機器人技能習得框架，該框架能夠 1) 高效擴展語言標記機器人數據的生成，並且 2) 有效地將這些數據提煉為強大的多任務語言條件視覺運動策略。對於（1），我們使用大型語言模型（LLM）來引導高層級規劃，並使用基於抽樣的機器人規劃器（例如運動或抓取抽樣器）來生成多樣且豐富的操作軌跡。為了使這個數據收集過程更加穩健，LLM 還推斷出每個任務的成功條件的代碼片段，同時使數據收集過程能夠檢測失敗並重試，以及自動標記軌跡的成功/失敗。對於（2），我們將擴散策略單任務行為克隆方法擴展到具有語言條件的多任務設置。最後，我們提出了一個新的多任務基準測試，涵蓋五個領域的 18 個任務，用於測試長時間視角行為、常識推理、工具使用和直觀物理。我們發現，我們提煉的策略在其數據收集策略中成功學習了強大的重試行為，同時在五個領域中平均提高了 34.8% 的絕對成功率。基準測試、代碼和定性結果可在我們的網站上找到：https://www.cs.columbia.edu/~huy/scalingup/

English

We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection policy, while improving absolute success rates by 34.8% on average across five domains. The benchmark, code, and qualitative results are on our website https://www.cs.columbia.edu/~huy/scalingup/

擴展與精煉：語言引導的機器人技能習得

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

摘要

Support