Theia:為機器人學習提煉多元視覺基礎模型
Theia: Distilling Diverse Vision Foundation Models for Robot Learning
July 29, 2024
作者: Jinghuan Shang, Karl Schmeckpeper, Brandon B. May, Maria Vittoria Minniti, Tarik Kelestemur, David Watkins, Laura Herlant
cs.AI
摘要
基於視覺的機器人策略學習將視覺輸入映射到動作,需要對多樣視覺任務進行全面理解,超越單一任務需求,如分類或分割。受此啟發,我們引入Theia,這是一個為機器人學習設計的視覺基礎模型,它提煉了多個在不同視覺任務上訓練的現成視覺基礎模型。Theia豐富的視覺表示編碼了多樣的視覺知識,增強了下游機器人學習。廣泛的實驗表明,Theia在使用更少的訓練數據和更小的模型尺寸的情況下,優於其教師模型和先前的機器人學習模型。此外,我們量化了預訓練視覺表示的質量,並假設特徵規範分佈中的較高熵將提高機器人學習性能。代碼和模型可在https://github.com/bdaiinstitute/theia找到。
English
Vision-based robot policy learning, which maps visual inputs to actions,
necessitates a holistic understanding of diverse visual tasks beyond
single-task needs like classification or segmentation. Inspired by this, we
introduce Theia, a vision foundation model for robot learning that distills
multiple off-the-shelf vision foundation models trained on varied vision tasks.
Theia's rich visual representations encode diverse visual knowledge, enhancing
downstream robot learning. Extensive experiments demonstrate that Theia
outperforms its teacher models and prior robot learning models using less
training data and smaller model sizes. Additionally, we quantify the quality of
pre-trained visual representations and hypothesize that higher entropy in
feature norm distributions leads to improved robot learning performance. Code
and models are available at https://github.com/bdaiinstitute/theia.Summary
AI-Generated Summary