POA: 全サイズモデル向けワンショット事前学習

要旨

大規模な自己教師あり事前学習は、1つの基盤モデルが多くの異なる視覚タスクを処理する道を開いてきました。ほとんどの事前学習手法では、一度に特定のサイズの単一モデルを訓練します。しかし、現実世界のシナリオでは、さまざまな計算やストレージの制約により、異なるサイズのモデルシリーズを開発して展開するために多大な努力が必要となります。そこで本研究では、この問題に対処するために、POA（Pre-training Once for All）と呼ばれる新しい3分岐自己教師あり訓練フレームワークを提案します。私たちのアプローチは、現代の自己蒸留パラダイムに革新的な弾性学生分岐を導入します。各事前学習ステップで、元の学生からサブネットワークをランダムにサンプリングして弾性学生を形成し、すべての分岐を自己蒸留方式で訓練します。事前学習が完了すると、POAは下流タスクのためにさまざまなサイズの事前学習済みモデルを抽出することを可能にします。注目すべきは、弾性学生が異なるサイズの複数のモデルを同時に事前学習することを促進し、さまざまなサイズのモデルの追加的なアンサンブルとしても機能し、表現学習を強化することです。k近傍法、線形プローブ評価、および複数の下流タスクでの評価を含む広範な実験は、私たちのPOAの有効性と利点を実証しています。ViT、Swin Transformer、ResNetバックボーンを使用して、単一の事前学習セッションで約100の異なるサイズのモデルを生成し、最先端のパフォーマンスを達成します。コードは以下で利用可能です: https://github.com/Qichuzyy/POA。

English

Large-scale self-supervised pre-training has paved the way for one foundation model to handle many different vision tasks. Most pre-training methodologies train a single model of a certain size at one time. Nevertheless, various computation or storage constraints in real-world scenarios require substantial efforts to develop a series of models with different sizes to deploy. Thus, in this study, we propose a novel tri-branch self-supervised training framework, termed as POA (Pre-training Once for All), to tackle this aforementioned issue. Our approach introduces an innovative elastic student branch into a modern self-distillation paradigm. At each pre-training step, we randomly sample a sub-network from the original student to form the elastic student and train all branches in a self-distilling fashion. Once pre-trained, POA allows the extraction of pre-trained models of diverse sizes for downstream tasks. Remarkably, the elastic student facilitates the simultaneous pre-training of multiple models with different sizes, which also acts as an additional ensemble of models of various sizes to enhance representation learning. Extensive experiments, including k-nearest neighbors, linear probing evaluation and assessments on multiple downstream tasks demonstrate the effectiveness and advantages of our POA. It achieves state-of-the-art performance using ViT, Swin Transformer and ResNet backbones, producing around a hundred models with different sizes through a single pre-training session. The code is available at: https://github.com/Qichuzyy/POA.

POA: 全サイズモデル向けワンショット事前学習

POA: Pre-training Once for Models of All Sizes

要旨

Support