感覺運動預訓練的機器人學習

摘要

我們提出了一種針對機器人的自監督感知運動預訓練方法。我們的模型名為RPT，是一個Transformer，運作於感知運動令牌序列上。給定一系列攝影機圖像、本體感應機器人狀態和過去動作，我們將交錯的序列編碼為令牌，遮罩掉一個隨機子集，並訓練模型來預測遮罩掉的內容。我們假設如果機器人能夠預測缺失的內容，它已經獲得了一個對物理世界的良好模型，可以使其行動。 RPT旨在操作潛在的視覺表示，這使得預測可行，實現了對10倍更大模型的擴展，以及在真實機器人上每秒10次的推論。為了評估我們的方法，我們使用運動規劃和基於模型的抓取算法結合，收集了一個包含20,000個真實世界軌跡的數據集，歷時9個月。我們發現在這些數據上進行預訓練一貫優於從頭開始訓練，導致在疊疊樂任務中提高了2倍，並具有良好的擴展性能。

English

We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. Given a sequence of camera images, proprioceptive robot states, and past actions, we encode the interleaved sequence into tokens, mask out a random subset, and train a model to predict the masked-out content. We hypothesize that if the robot can predict the missing content it has acquired a good model of the physical world that can enable it to act. RPT is designed to operate on latent visual representations which makes prediction tractable, enables scaling to 10x larger models, and 10 Hz inference on a real robot. To evaluate our approach, we collect a dataset of 20,000 real-world trajectories over 9 months using a combination of motion planning and model-based grasping algorithms. We find that pre-training on this data consistently outperforms training from scratch, leads to 2x improvements in the block stacking task, and has favorable scaling properties.

感覺運動預訓練的機器人學習

Robot Learning with Sensorimotor Pre-training

摘要

Support