ChatPaper.aiChatPaper

以數據為核心重新審視預訓練視覺模型在機器人學習中的應用

A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning

March 10, 2025
作者: Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi
cs.AI

摘要

預訓練視覺模型(PVMs)是現代機器人技術的基石,然而其最佳配置仍不明確。通過系統性評估,我們發現,儘管DINO和iBOT在視覺運動控制和感知任務上優於MAE,但在非(單)物體中心(NOC)數據上訓練時表現欠佳——這一侷限性與其學習物體中心表徵能力下降密切相關。此項研究表明,從非物體中心的機器人數據集中形成物體中心表徵的能力是PVMs成功的關鍵。受此發現啟發,我們設計了SlotMIM方法,該方法通過引入語義瓶頸來減少原型數量,以促進物體性的顯現,並採用跨視圖一致性正則化來鼓勵多視圖不變性。我們的實驗涵蓋了在物體中心、場景中心、網絡爬取和自我中心數據上的預訓練。在所有設置中,我們的方法學習到了可遷移的表徵,並在圖像識別、場景理解和機器人學習評估中相較於先前工作取得了顯著提升。當使用百萬級數據集進行擴展時,我們的方法也展現出卓越的數據效率和可擴展性。我們的代碼和模型已公開於https://github.com/CVMI-Lab/SlotMIM。
English
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. Through systematic evaluation, we find that while DINO and iBOT outperform MAE across visuomotor control and perception tasks, they struggle when trained on non-(single-)object-centric (NOC) data--a limitation strongly correlated with their diminished ability to learn object-centric representations. This investigation indicates that the ability to form object-centric representations from the non-object-centric robotics dataset is the key to success for PVMs. Motivated by this discovery, we designed SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck to reduce the number of prototypes to encourage the emergence of objectness as well as cross-view consistency regularization for encouraging multiview invariance. Our experiments encompass pre-training on object-centric, scene-centric, web-crawled, and ego-centric data. Across all settings, our approach learns transferrable representations and achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations. When scaled up with million-scale datasets, our method also demonstrates superior data efficiency and scalability. Our code and models are publicly available at https://github.com/CVMI-Lab/SlotMIM.

Summary

AI-Generated Summary

PDF32March 11, 2025