POA:一次预训练适用于各种规模的模型
POA: Pre-training Once for Models of All Sizes
August 2, 2024
作者: Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang
cs.AI
摘要
大规模自监督预训练为一个基础模型处理多种不同视觉任务铺平了道路。大多数预训练方法一次训练一个特定大小的单一模型。然而,在现实场景中的各种计算或存储限制要求开发一系列不同大小的模型以供部署需要大量工作。因此,在本研究中,我们提出了一种新颖的三支自监督训练框架,称为POA(一次预训练适用于所有),以解决上述问题。我们的方法将一种创新的弹性学生支架引入现代自蒸馏范式中。在每个预训练步骤中,我们从原始学生中随机抽样一个子网络形成弹性学生,并以自蒸馏方式训练所有支枝。一旦预训练完成,POA允许提取不同大小的预训练模型用于下游任务。值得注意的是,弹性学生促进了同时预训练多个不同大小的模型,还作为各种大小模型的额外集成,以增强表示学习。包括k最近邻、线性探测评估以及多个下游任务的广泛实验表明了我们的POA的有效性和优势。它在ViT、Swin Transformer和ResNet骨干上实现了最先进的性能,通过单次预训练会话生成大约一百个不同大小的模型。代码可在以下链接找到:https://github.com/Qichuzyy/POA。
English
Large-scale self-supervised pre-training has paved the way for one foundation
model to handle many different vision tasks. Most pre-training methodologies
train a single model of a certain size at one time. Nevertheless, various
computation or storage constraints in real-world scenarios require substantial
efforts to develop a series of models with different sizes to deploy. Thus, in
this study, we propose a novel tri-branch self-supervised training framework,
termed as POA (Pre-training Once for All), to tackle this aforementioned issue.
Our approach introduces an innovative elastic student branch into a modern
self-distillation paradigm. At each pre-training step, we randomly sample a
sub-network from the original student to form the elastic student and train all
branches in a self-distilling fashion. Once pre-trained, POA allows the
extraction of pre-trained models of diverse sizes for downstream tasks.
Remarkably, the elastic student facilitates the simultaneous pre-training of
multiple models with different sizes, which also acts as an additional ensemble
of models of various sizes to enhance representation learning. Extensive
experiments, including k-nearest neighbors, linear probing evaluation and
assessments on multiple downstream tasks demonstrate the effectiveness and
advantages of our POA. It achieves state-of-the-art performance using ViT, Swin
Transformer and ResNet backbones, producing around a hundred models with
different sizes through a single pre-training session. The code is available
at: https://github.com/Qichuzyy/POA.Summary
AI-Generated Summary