Theia：为机器人学习提炼多样视觉基础模型

摘要

基于视觉的机器人策略学习将视觉输入映射到动作，需要对多样化的视觉任务有全面的理解，超越了单一任务需求，如分类或分割。受此启发，我们引入了Theia，这是一个用于机器人学习的视觉基础模型，它提炼了在不同视觉任务上训练过的多个现成视觉基础模型。Theia的丰富视觉表示编码了多样化的视觉知识，增强了下游机器人学习。大量实验证明，Theia在使用更少的训练数据和更小的模型尺寸的情况下优于其教师模型和先前的机器人学习模型。此外，我们量化了预训练视觉表示的质量，并假设特征范数分布中的较高熵会导致改善的机器人学习性能。代码和模型可在https://github.com/bdaiinstitute/theia获取。

English

Vision-based robot policy learning, which maps visual inputs to actions, necessitates a holistic understanding of diverse visual tasks beyond single-task needs like classification or segmentation. Inspired by this, we introduce Theia, a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks. Theia's rich visual representations encode diverse visual knowledge, enhancing downstream robot learning. Extensive experiments demonstrate that Theia outperforms its teacher models and prior robot learning models using less training data and smaller model sizes. Additionally, we quantify the quality of pre-trained visual representations and hypothesize that higher entropy in feature norm distributions leads to improved robot learning performance. Code and models are available at https://github.com/bdaiinstitute/theia.

Theia：为机器人学习提炼多样视觉基础模型

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

摘要

Support