扩散模型在图像分类任务上胜过生成对抗网络GAN。

摘要

虽然许多无监督学习模型专注于一类任务家族，即生成式或判别式，但我们探讨了统一表示学习器的可能性：一种模型，它利用单一的预训练阶段同时处理这两类任务家族。我们确定扩散模型是一个主要候选者。扩散模型已经成为图像生成、去噪、修补、超分辨率、操作等领域的最先进方法。这些模型涉及训练 U-Net 来迭代地预测和去除噪音，生成的模型可以合成高保真度、多样化、新颖的图像。作为基于卷积的架构，U-Net 架构生成多样化的特征表示，以中间特征图的形式呈现。我们展示了这些嵌入在噪音预测任务之外也很有用，因为它们包含判别信息，也可以用于分类。我们探索了提取和利用这些嵌入进行分类任务的最佳方法，展示了在 ImageNet 分类任务上的有希望的结果。我们发现，通过仔细的特征选择和池化，扩散模型在分类任务上胜过了类似的生成-判别方法，如 BigBiGAN。我们研究了转移学习情境中的扩散模型，检查它们在几个细粒度视觉分类数据集上的表现。我们将这些嵌入与竞争架构和预训练生成的嵌入进行了比较，用于分类任务。

English

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc. Such models involve training a U-Net to iteratively predict and remove noise, and the resulting model can synthesize high fidelity, diverse, novel images. The U-Net architecture, as a convolution-based architecture, generates a diverse set of feature representations in the form of intermediate feature maps. We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification. We explore optimal methods for extracting and using these embeddings for classification tasks, demonstrating promising results on the ImageNet classification task. We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods such as BigBiGAN for classification tasks. We investigate diffusion models in the transfer learning regime, examining their performance on several fine-grained visual classification datasets. We compare these embeddings to those generated by competing architectures and pre-trainings for classification tasks.

扩散模型在图像分类任务上胜过生成对抗网络GAN。

Diffusion Models Beat GANs on Image Classification

摘要

Support