DreamTeacher: 深層生成モデルを用いた画像バックボーンの事前学習

要旨

本研究では、生成ネットワークを活用した自己教師あり特徴表現学習フレームワーク「DreamTeacher」を提案します。このフレームワークは、下流の画像バックボーンを事前学習するために使用されます。私たちは、特定の知覚タスク向けに最適化された標準的な画像バックボーンに、訓練済みの生成モデルから知識を蒸留することを提案します。2種類の知識蒸留を調査しました：1）大規模なラベル付きデータセット（例：ImageNet）での事前学習の代替として、学習された生成特徴をターゲット画像バックボーンに蒸留する方法、2）生成ネットワークとタスクヘッドから得られたラベルをターゲットバックボーンのロジットに蒸留する方法です。複数の生成モデル、密な予測ベンチマーク、およびいくつかの事前学習体制について広範な分析を行いました。実験的に、私たちのDreamTeacherが既存の自己教師あり表現学習アプローチを全体的に大幅に上回ることを確認しました。DreamTeacherを用いた教師なしImageNet事前学習は、下流データセットにおけるImageNet分類事前学習を大幅に改善し、特に拡散生成モデルを含む生成モデルが、大規模で多様なデータセットにおける手動アノテーションを必要としない表現学習の有望なアプローチであることを示しています。

English

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.

DreamTeacher: 深層生成モデルを用いた画像バックボーンの事前学習

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

要旨

Support