DreamTeacher：使用深度生成模型对图像骨干进行预训练

摘要

在这项工作中，我们介绍了一种自监督特征表示学习框架 DreamTeacher，该框架利用生成网络对下游图像主干进行预训练。我们提出从经过训练的生成模型中提炼知识，注入到经过良好设计用于特定感知任务的标准图像主干中。我们研究了两种类型的知识提炼：1) 将学习到的生成特征提炼到目标图像主干上，作为替代方案，而非对这些主干在大型标记数据集（如ImageNet）上进行预训练；2) 将从生成网络和任务头获得的标签提炼到目标主干的对数中。我们对多个生成模型、密集预测基准和几种预训练方案进行了广泛分析。我们经验性地发现，我们的 DreamTeacher 在各方面明显优于现有的自监督表示学习方法。使用 DreamTeacher 进行无监督的 ImageNet 预训练，相比于在下游数据集上进行 ImageNet 分类预训练，能够显著提升性能，展示了生成模型，尤其是扩散生成模型，作为在大型、多样化数据集上进行表示学习的一种有前途的方法，而无需手动注释。

English

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.

DreamTeacher：使用深度生成模型对图像骨干进行预训练

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

摘要

Support