スケールアップと蒸留ダウン：言語誘導型ロボットスキル獲得

要旨

ロボットスキル獲得のためのフレームワークを提案する。本フレームワークは、1) 言語ラベル付きロボットデータの生成を効率的にスケールアップし、2) このデータを堅牢なマルチタスク言語条件付き視覚運動ポリシーに効果的に蒸留する。1) に関しては、大規模言語モデル（LLM）を用いて高レベルの計画をガイドし、サンプリングベースのロボットプランナー（例えば、動作や把持のサンプラー）を用いて多様で豊富な操作軌道を生成する。このデータ収集プロセスを堅牢化するため、LLMは各タスクの成功条件を推論し、コードスニペットを生成する。これにより、データ収集プロセスが失敗を検出して再試行できると同時に、軌道の成功/失敗の自動ラベル付けも可能になる。2) に関しては、単一タスクの行動クローニングアプローチである拡散ポリシーを、言語条件付きのマルチタスク設定に拡張する。最後に、長期的な行動、常識的推論、道具の使用、直感的な物理をテストするため、5つのドメインにわたる18のタスクからなる新しいマルチタスクベンチマークを提案する。蒸留されたポリシーは、データ収集ポリシーにおける堅牢な再試行行動を学習しつつ、5つのドメイン全体で平均34.8%の絶対成功率を向上させることがわかった。ベンチマーク、コード、および定性的な結果は、ウェブサイトhttps://www.cs.columbia.edu/~huy/scalingup/で公開している。

English

We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure. For (2), we extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. Finally, we propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. We find that our distilled policy successfully learned the robust retrying behavior in its data collection policy, while improving absolute success rates by 34.8% on average across five domains. The benchmark, code, and qualitative results are on our website https://www.cs.columbia.edu/~huy/scalingup/

スケールアップと蒸留ダウン：言語誘導型ロボットスキル獲得

Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

要旨

Support