ChatPaper.aiChatPaper

POA:一次預訓練適用於各種大小的模型

POA: Pre-training Once for Models of All Sizes

August 2, 2024
作者: Yingying Zhang, Xin Guo, Jiangwei Lao, Lei Yu, Lixiang Ru, Jian Wang, Guo Ye, Huimei He, Jingdong Chen, Ming Yang
cs.AI

摘要

大规模的自监督预训练为一种基础模型处理多种不同视觉任务铺平了道路。大多数预训练方法一次训练一个特定大小的模型。然而,在现实场景中,各种计算或存储约束需要大量努力开发一系列不同大小的模型以部署。因此,在本研究中,我们提出了一种新颖的三支自监督训练框架,称为POA(一次预训练适用于所有),以解决上述问题。我们的方法将一种创新的弹性学生支路引入到现代自蒸馏范式中。在每个预训练步骤中,我们从原始学生中随机抽样一个子网络形成弹性学生,并以自蒸馏方式训练所有支路。一旦预训练完成,POA允许提取出不同大小的预训练模型用于下游任务。值得注意的是,弹性学生促进了多个不同大小模型的同时预训练,同时也作为各种大小模型的额外集成,以增强表示学习。广泛的实验,包括k最近邻、线性探测评估以及对多个下游任务的评估,展示了我们的POA的有效性和优势。它使用ViT、Swin Transformer和ResNet骨干网络实现了最先进的性能,通过单次预训练会产生大约一百个不同大小的模型。代码可在以下链接找到:https://github.com/Qichuzyy/POA。
English
Large-scale self-supervised pre-training has paved the way for one foundation model to handle many different vision tasks. Most pre-training methodologies train a single model of a certain size at one time. Nevertheless, various computation or storage constraints in real-world scenarios require substantial efforts to develop a series of models with different sizes to deploy. Thus, in this study, we propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All), to tackle this aforementioned issue. Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm. At each pre-training step, we randomly sample a sub-network from the original student to form the elastic student and train all branches in a self-distilling fashion. Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks. Remarkably, the elastic student facilitates the simultaneous pre-training of multiple models with different sizes, which also acts as an additional ensemble of models of various sizes to enhance representation learning. Extensive experiments, including k-nearest neighbors, linear probing evaluation and assessments on multiple downstream tasks demonstrate the effectiveness and advantages of our POA. It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones, producing around a hundred models with different sizes through a single pre-training session. The code is available at: https://github.com/Qichuzyy/POA.

Summary

AI-Generated Summary

PDF293November 28, 2024