机器人预训练机器人：基于大规模机器人数据集的操作中心机器人表示学

摘要

通过视觉表示的预训练增强了机器人学习的效率。由于缺乏大规模领域内的机器人数据集，先前的研究利用野外人类视频来预训练机器人视觉表示。尽管取得了令人期待的结果，但来自人类视频的表示不可避免地会受到分布转移的影响，并且缺乏对任务完成至关重要的动态信息。我们首先评估各种预训练表示在与下游机器人操作任务（即，操作中心性）的相关性方面。有趣的是，我们发现“操作中心性”是应用于下游任务时成功率的强有力指标。基于这些发现，我们提出了操作中心表示（MCR），这是一个基础表示学习框架，捕捉了视觉特征和操纵任务的动态信息，如动作和本体感知，以提高操作中心性。具体来说，我们在DROID机器人数据集上预训练一个视觉编码器，并利用与运动相关的数据，如机器人本体感知状态和动作。我们引入了一种新颖的对比损失，将视觉观察与机器人的本体感知状态-动作动态对齐，结合类似行为克隆（BC）的演员损失，在预训练期间预测动作，以及时间对比损失。在20个任务的4个模拟领域中的实证结果验证了MCR比最强基准方法的表现提高了14.8%。此外，MCR将UR5e机械臂在3个真实世界任务上的数据高效学习性能提升了76.9%。项目网站：https://robots-pretrain-robots.github.io/。

English

The pre-training of visual representations has enhanced the efficiency of robot learning. Due to the lack of large-scale in-domain robotic datasets, prior works utilize in-the-wild human videos to pre-train robotic visual representation. Despite their promising results, representations from human videos are inevitably subject to distribution shifts and lack the dynamics information crucial for task completion. We first evaluate various pre-trained representations in terms of their correlation to the downstream robotic manipulation tasks (i.e., manipulation centricity). Interestingly, we find that the "manipulation centricity" is a strong indicator of success rates when applied to downstream tasks. Drawing from these findings, we propose Manipulation Centric Representation (MCR), a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks to improve manipulation centricity. Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions. We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss. Empirical results across 4 simulation domains with 20 tasks verify that MCR outperforms the strongest baseline method by 14.8%. Moreover, MCR boosts the performance of data-efficient learning with a UR5e arm on 3 real-world tasks by 76.9%. Project website: https://robots-pretrain-robots.github.io/.

机器人预训练机器人：基于大规模机器人数据集的操作中心机器人表示学

Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

摘要

Support