DreamTeacher：使用深度生成模型對影像主幹進行預訓練

摘要

在這份工作中，我們介紹了一個自監督特徵表示學習框架 DreamTeacher，該框架利用生成網絡來預訓練下游圖像主幹。我們提出將從訓練過的生成模型中提煉知識，注入已經為特定感知任務進行良好工程設計的標準圖像主幹中。我們探討了兩種知識提煉方式：1) 將學習到的生成特徵提煉到目標圖像主幹上，作為對於在大型標記數據集（如 ImageNet）上預訓練這些主幹的替代方法；以及 2) 將從生成網絡和任務頭獲得的標籤提煉到目標主幹的 logits 上。我們對多個生成模型、密集預測基準和多種預訓練方案進行了廣泛分析。我們在實驗中發現，我們的 DreamTeacher 在各方面明顯優於現有的自監督表示學習方法。使用 DreamTeacher 進行無監督的 ImageNet 預訓練，比在下游數據集上進行 ImageNet 分類預訓練帶來了顯著的改善，展示了生成模型，特別是擴散生成模型，作為在大型、多樣數據集上進行表示學習的有前途方法，而無需手動標註。

English

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.

DreamTeacher：使用深度生成模型對影像主幹進行預訓練

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

摘要

Support