ロボット操作の模倣学におけるデータスケーリング則

要旨

データのスケーリングは、自然言語処理やコンピュータビジョンなどの分野に革命をもたらし、モデルに顕著な汎化能力を提供しています。本論文では、特にロボティクス、特にロボティックマニピュレーションにおいて同様のデータスケーリング則が存在するかどうか、適切なデータスケーリングが、同じカテゴリ内の任意のオブジェクトに対してゼロショットで展開可能な単一タスクロボットポリシーを生み出すことができるかを調査します。このため、模倣学習におけるデータスケーリングに関する包括的な実証的研究を行います。多くの環境とオブジェクトでデータを収集することで、トレーニング環境、オブジェクト、デモンストレーションの数が変化するにつれてポリシーの汎化性能がどのように変化するかを調査します。研究全体で、厳密な評価プロトコルの下で、4万回以上のデモンストレーションを収集し、1万5000回以上の実世界のロボット展開を実行します。我々の調査からいくつかの興味深い結果が明らかになりました。ポリシーの汎化性能は、環境とオブジェクトの数とほぼべき乗則の関係に従います。環境とオブジェクトの多様性が、デモンストレーションの絶対数よりもはるかに重要であることがわかりました。環境またはオブジェクトごとのデモンストレーション数が一定の閾値に達すると、追加のデモンストレーションはほとんど効果がありません。これらの知見に基づいて、効率的なデータ収集戦略を提案します。1つの午後に4人のデータ収集者が作業することで、未知のオブジェクトを持つ新しい環境で、2つのタスクのポリシーが約90%の成功率を達成するために十分なデータを収集します。

English

Data scaling has revolutionized fields like natural language processing and computer vision, providing models with remarkable generalization capabilities. In this paper, we investigate whether similar data scaling laws exist in robotics, particularly in robotic manipulation, and whether appropriate data scaling can yield single-task robot policies that can be deployed zero-shot for any object within the same category in any environment. To this end, we conduct a comprehensive empirical study on data scaling in imitation learning. By collecting data across numerous environments and objects, we study how a policy's generalization performance changes with the number of training environments, objects, and demonstrations. Throughout our research, we collect over 40,000 demonstrations and execute more than 15,000 real-world robot rollouts under a rigorous evaluation protocol. Our findings reveal several intriguing results: the generalization performance of the policy follows a roughly power-law relationship with the number of environments and objects. The diversity of environments and objects is far more important than the absolute number of demonstrations; once the number of demonstrations per environment or object reaches a certain threshold, additional demonstrations have minimal effect. Based on these insights, we propose an efficient data collection strategy. With four data collectors working for one afternoon, we collect sufficient data to enable the policies for two tasks to achieve approximately 90% success rates in novel environments with unseen objects.

ロボット操作の模倣学におけるデータスケーリング則

Data Scaling Laws in Imitation Learning for Robotic Manipulation

要旨

Support