机器人操纵中的模仿学习数据缩放定律
Data Scaling Laws in Imitation Learning for Robotic Manipulation
October 24, 2024
作者: Fanqi Lin, Yingdong Hu, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
cs.AI
摘要
数据缩放已经彻底改变了自然语言处理和计算机视觉等领域,为模型提供了显著的泛化能力。在本文中,我们研究了在机器人技术中,特别是在机器人操作中是否存在类似的数据缩放规律,以及适当的数据缩放是否能够产生可以在任何环境中针对同一类别的任何物体进行零-shot部署的单任务机器人策略。为此,我们对模仿学习中的数据缩放进行了全面的实证研究。通过在众多环境和物体中收集数据,我们研究了策略的泛化性能如何随着训练环境、物体和演示数量的变化而变化。在我们的研究过程中,我们收集了超过40,000个演示,并在严格的评估协议下执行了超过15,000次真实世界机器人实验。我们的研究结果揭示了一些有趣的发现:策略的泛化性能与环境和物体数量之间大致呈幂律关系。环境和物体的多样性比演示的绝对数量更重要;一旦每个环境或物体的演示数量达到一定阈值,额外的演示就几乎没有效果。基于这些发现,我们提出了一种高效的数据收集策略。通过四名数据收集员工作一个下午,我们收集了足够的数据,使得两项任务的策略在新颖环境中对未见过的物体实现了约90%的成功率。
English
Data scaling has revolutionized fields like natural language processing and
computer vision, providing models with remarkable generalization capabilities.
In this paper, we investigate whether similar data scaling laws exist in
robotics, particularly in robotic manipulation, and whether appropriate data
scaling can yield single-task robot policies that can be deployed zero-shot for
any object within the same category in any environment. To this end, we conduct
a comprehensive empirical study on data scaling in imitation learning. By
collecting data across numerous environments and objects, we study how a
policy's generalization performance changes with the number of training
environments, objects, and demonstrations. Throughout our research, we collect
over 40,000 demonstrations and execute more than 15,000 real-world robot
rollouts under a rigorous evaluation protocol. Our findings reveal several
intriguing results: the generalization performance of the policy follows a
roughly power-law relationship with the number of environments and objects. The
diversity of environments and objects is far more important than the absolute
number of demonstrations; once the number of demonstrations per environment or
object reaches a certain threshold, additional demonstrations have minimal
effect. Based on these insights, we propose an efficient data collection
strategy. With four data collectors working for one afternoon, we collect
sufficient data to enable the policies for two tasks to achieve approximately
90% success rates in novel environments with unseen objects.Summary
AI-Generated Summary