POA: 모든 크기의 모델을 위한 한 번의 사전 훈련

초록

대규모 자가 지도 학습 사전 훈련은 하나의 기본 모델이 여러 가지 다른 시각 작업을 처리할 수 있도록 해 주었습니다. 대부분의 사전 훈련 방법론은 한 번에 특정 크기의 단일 모델을 훈련시킵니다. 그러나 실제 시나리오에서의 다양한 계산 또는 저장 제약으로 인해 다양한 크기의 모델을 개발하여 배포하는 데 상당한 노력이 필요합니다. 따라서 본 연구에서는 이러한 문제에 대처하기 위해 POA(Pre-training Once for All)라는 새로운 삼 분기 자가 지도 학습 프레임워크를 제안합니다. 저희 방법은 현대적인 자가 증류 패러다임에 혁신적인 탄성 학생 분기를 도입합니다. 각 사전 훈련 단계에서 우리는 원본 학생에서 하위 네트워크를 무작위로 샘플링하여 탄성 학생을 형성하고 모든 분기를 자가 증류 방식으로 훈련시킵니다. 한 번 사전 훈련된 후, POA는 하향 작업을 위해 다양한 크기의 사전 훈련된 모델을 추출할 수 있습니다. 놀랍게도, 탄성 학생은 다양한 크기의 여러 모델을 동시에 사전 훈련하는 데 도움이 되며, 표현 학습을 강화하기 위해 다양한 크기의 모델 앙상블로 작용합니다. ViT, Swin Transformer 및 ResNet 백본을 사용하여 POA의 효과와 장점을 입증하는 다양한 실험, k-최근접 이웃, 선형 프로빙 평가 및 여러 하향 작업 평가가 수행되었습니다. POA는 단일 사전 훈련 세션을 통해 다양한 크기의 약 백 개 모델을 생성하여 최첨단 성능을 달성합니다. 코드는 다음에서 확인할 수 있습니다: https://github.com/Qichuzyy/POA.

English

Large-scale self-supervised pre-training has paved the way for one foundation model to handle many different vision tasks. Most pre-training methodologies train a single model of a certain size at one time. Nevertheless, various computation or storage constraints in real-world scenarios require substantial efforts to develop a series of models with different sizes to deploy. Thus, in this study, we propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All), to tackle this aforementioned issue. Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm. At each pre-training step, we randomly sample a sub-network from the original student to form the elastic student and train all branches in a self-distilling fashion. Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks. Remarkably, the elastic student facilitates the simultaneous pre-training of multiple models with different sizes, which also acts as an additional ensemble of models of various sizes to enhance representation learning. Extensive experiments, including k-nearest neighbors, linear probing evaluation and assessments on multiple downstream tasks demonstrate the effectiveness and advantages of our POA. It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones, producing around a hundred models with different sizes through a single pre-training session. The code is available at: https://github.com/Qichuzyy/POA.

POA: 모든 크기의 모델을 위한 한 번의 사전 훈련

POA: Pre-training Once for Models of All Sizes

초록

Support