Diffusion 모델이 이미지 분류에서 GAN을 능가하다

초록

많은 비지도 학습 모델이 생성적 또는 판별적 작업 중 한 가지 유형에 초점을 맞추는 반면, 우리는 두 가지 유형의 작업을 동시에 해결하기 위해 단일 사전 학습 단계를 사용하는 통합 표현 학습 모델의 가능성을 탐구합니다. 우리는 확산 모델을 주요 후보로 식별합니다. 확산 모델은 이미지 생성, 노이즈 제거, 인페인팅, 초해상도, 조작 등에서 최첨단 방법으로 부상했습니다. 이러한 모델은 U-Net을 반복적으로 노이즈를 예측하고 제거하도록 훈련시키는 과정을 포함하며, 결과적으로 고화질, 다양성, 독창성을 갖춘 이미지를 합성할 수 있습니다. U-Net 아키텍처는 컨볼루션 기반 아키텍처로, 중간 특징 맵 형태로 다양한 특징 표현을 생성합니다. 우리는 이러한 임베딩이 노이즈 예측 작업을 넘어 판별 정보를 포함하고 분류 작업에도 활용될 수 있다는 연구 결과를 제시합니다. 우리는 이러한 임베딩을 추출하고 분류 작업에 사용하기 위한 최적의 방법을 탐구하며, ImageNet 분류 작업에서 유망한 결과를 보여줍니다. 신중한 특징 선택과 풀링을 통해 확산 모델이 BigBiGAN과 같은 생성-판별 방법을 분류 작업에서 능가한다는 것을 발견했습니다. 우리는 전이 학습 환경에서 확산 모델을 조사하고, 여러 세분화된 시각적 분류 데이터셋에서의 성능을 검토합니다. 이러한 임베딩을 경쟁 아키텍처 및 사전 학습 방법에서 생성된 임베딩과 분류 작업에 대해 비교합니다.

English

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc. Such models involve training a U-Net to iteratively predict and remove noise, and the resulting model can synthesize high fidelity, diverse, novel images. The U-Net architecture, as a convolution-based architecture, generates a diverse set of feature representations in the form of intermediate feature maps. We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification. We explore optimal methods for extracting and using these embeddings for classification tasks, demonstrating promising results on the ImageNet classification task. We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods such as BigBiGAN for classification tasks. We investigate diffusion models in the transfer learning regime, examining their performance on several fine-grained visual classification datasets. We compare these embeddings to those generated by competing architectures and pre-trainings for classification tasks.

Diffusion 모델이 이미지 분류에서 GAN을 능가하다

Diffusion Models Beat GANs on Image Classification

초록

Support