多樣性是否足以實現可擴展的機器人操作?
Is Diversity All You Need for Scalable Robotic Manipulation?
July 8, 2025
作者: Modi Shi, Li Chen, Jin Chen, Yuxiang Lu, Chiming Liu, Guanghui Ren, Ping Luo, Di Huang, Maoqing Yao, Hongyang Li
cs.AI
摘要
數據規模化在自然語言處理(NLP)和計算機視覺(CV)的基礎模型中取得了顯著成功,然而在機器人操作領域,有效數據規模化的原則仍未被充分理解。本研究通過考察三個關鍵維度——任務(做什麼)、具身(使用哪種機器人)和專家(由誰示範)——深入探討了數據多樣性在機器人學習中的細微作用,挑戰了“多樣性越高越好”的傳統直覺。通過在多種機器人平台上進行的大量實驗,我們揭示出:(1)任務多樣性比單一任務的示範數量更為關鍵,有利於從多樣化的預訓練任務向新穎的下游場景遷移;(2)多具身預訓練數據對於跨具身遷移並非必需——基於高質量單具身數據訓練的模型能有效遷移至不同平台,在微調過程中展現出比多具身預訓練模型更理想的規模化特性;(3)專家多樣性,源於個體操作偏好和人類示範中的隨機變異,可能對策略學習造成混淆,其中速度的多模態性成為一個關鍵影響因素。基於這一洞察,我們提出了一種分佈去偏方法來緩解速度模糊性,由此產生的GO-1-Pro模型實現了15%的性能提升,相當於使用了2.5倍的預訓練數據。總體而言,這些發現為如何有效規模化機器人操作數據集提供了新的視角和實用指導。
English
Data scaling has driven remarkable success in foundation models for Natural
Language Processing (NLP) and Computer Vision (CV), yet the principles of
effective data scaling in robotic manipulation remain insufficiently
understood. In this work, we investigate the nuanced role of data diversity in
robot learning by examining three critical dimensions-task (what to do),
embodiment (which robot to use), and expert (who demonstrates)-challenging the
conventional intuition of "more diverse is better". Throughout extensive
experiments on various robot platforms, we reveal that (1) task diversity
proves more critical than per-task demonstration quantity, benefiting transfer
from diverse pre-training tasks to novel downstream scenarios; (2)
multi-embodiment pre-training data is optional for cross-embodiment
transfer-models trained on high-quality single-embodiment data can efficiently
transfer to different platforms, showing more desirable scaling property during
fine-tuning than multi-embodiment pre-trained models; and (3) expert diversity,
arising from individual operational preferences and stochastic variations in
human demonstrations, can be confounding to policy learning, with velocity
multimodality emerging as a key contributing factor. Based on this insight, we
propose a distribution debiasing method to mitigate velocity ambiguity, the
yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to
using 2.5 times pre-training data. Collectively, these findings provide new
perspectives and offer practical guidance on how to scale robotic manipulation
datasets effectively.