擴散模型在圖像分類上勝過生成對抗網絡GAN。

摘要

雖然許多非監督式學習模型專注於一類任務家族，無論是生成式或是判別式，我們探索了統一表示學習器的可能性：一個模型可利用單一預訓練階段同時應對這兩類任務家族。我們確認擴散模型是一個主要候選者。擴散模型已嶄露頭角，成為圖像生成、去噪、修補、超解析、操作等的最先進方法。這類模型包括訓練 U-Net 來迭代預測並去除噪聲，結果模型能夠合成高保真度、多樣性、新穎的圖像。作為基於卷積的結構，U-Net 架構以中間特徵圖的形式生成多樣的特徵表示。我們呈現了我們的發現，這些嵌入不僅在去噪任務中有用，因為它們包含判別信息，也可用於分類。我們探索了提取和使用這些嵌入進行分類任務的最佳方法，展示了在 ImageNet 分類任務上的有希望結果。我們發現，通過仔細的特徵選擇和池化，擴散模型在分類任務上勝過了類似的生成-判別方法，如 BigBiGAN。我們在轉移學習範疇中研究了擴散模型，檢驗了它們在幾個細粒度視覺分類數據集上的表現。我們將這些嵌入與競爭架構和預訓練生成的嵌入進行比較，用於分類任務。

English

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc. Such models involve training a U-Net to iteratively predict and remove noise, and the resulting model can synthesize high fidelity, diverse, novel images. The U-Net architecture, as a convolution-based architecture, generates a diverse set of feature representations in the form of intermediate feature maps. We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification. We explore optimal methods for extracting and using these embeddings for classification tasks, demonstrating promising results on the ImageNet classification task. We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods such as BigBiGAN for classification tasks. We investigate diffusion models in the transfer learning regime, examining their performance on several fine-grained visual classification datasets. We compare these embeddings to those generated by competing architectures and pre-trainings for classification tasks.

擴散模型在圖像分類上勝過生成對抗網絡GAN。

Diffusion Models Beat GANs on Image Classification

摘要

Support