ChatPaper.aiChatPaper

多样性是否足以实现可扩展的机器人操作?

Is Diversity All You Need for Scalable Robotic Manipulation?

July 8, 2025
作者: Modi Shi, Li Chen, Jin Chen, Yuxiang Lu, Chiming Liu, Guanghui Ren, Ping Luo, Di Huang, Maoqing Yao, Hongyang Li
cs.AI

摘要

数据规模扩展在自然语言处理(NLP)和计算机视觉(CV)的基础模型中取得了显著成功,然而,在机器人操作领域,有效数据扩展的原则仍未被充分理解。本研究通过考察三个关键维度——任务(做什么)、实体(使用哪种机器人)和专家(由谁演示),深入探讨了数据多样性在机器人学习中的微妙作用,挑战了“多样性越多越好”的传统直觉。通过在各种机器人平台上进行的大量实验,我们发现:(1)任务多样性比每个任务的演示数量更为关键,有助于从多样化的预训练任务向新颖的下游场景迁移;(2)多实体预训练数据对于跨实体迁移并非必需——基于高质量单实体数据训练的模型能够高效迁移到不同平台,在微调过程中展现出比多实体预训练模型更理想的扩展特性;(3)专家多样性,源于个体操作偏好和人类演示中的随机变化,可能会对策略学习造成混淆,其中速度多模态性成为关键影响因素。基于这一洞察,我们提出了一种分布去偏方法以缓解速度模糊性,由此产生的GO-1-Pro模型实现了15%的性能提升,相当于使用了2.5倍的预训练数据。这些发现共同为如何有效扩展机器人操作数据集提供了新的视角和实用指导。
English
Data scaling has driven remarkable success in foundation models for Natural Language Processing (NLP) and Computer Vision (CV), yet the principles of effective data scaling in robotic manipulation remain insufficiently understood. In this work, we investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of "more diverse is better". Throughout extensive experiments on various robot platforms, we reveal that (1) task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios; (2) multi-embodiment pre-training data is optional for cross-embodiment transfer-models trained on high-quality single-embodiment data can efficiently transfer to different platforms, showing more desirable scaling property during fine-tuning than multi-embodiment pre-trained models; and (3) expert diversity, arising from individual operational preferences and stochastic variations in human demonstrations, can be confounding to policy learning, with velocity multimodality emerging as a key contributing factor. Based on this insight, we propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data. Collectively, these findings provide new perspectives and offer practical guidance on how to scale robotic manipulation datasets effectively.
PDF181July 9, 2025