自己教師あり学習のためのデノイジング拡散モデルの分解

要旨

本研究では、元々画像生成を目的として開発されたDenoising Diffusion Models（DDM）の表現学習能力を検証します。私たちのアプローチは、DDMを段階的に分解し、古典的なDenoising Autoencoder（DAE）へと変換していくものです。この分解プロセスを通じて、現代のDDMの各構成要素が自己教師あり表現学習にどのように影響するかを探ります。その結果、優れた表現を学習する上で重要な要素はごく少数であり、その他の多くの要素は非本質的であることが観察されました。最終的に、本研究は古典的なDAEに非常に近い、大幅に簡素化された手法に到達しました。私たちは、この研究が現代の自己教師あり学習の領域において、古典的手法に対する関心を再び呼び起こすことを期待しています。

English

In this study, we examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. Our philosophy is to deconstruct a DDM, gradually transforming it into a classical Denoising Autoencoder (DAE). This deconstructive procedure allows us to explore how various components of modern DDMs influence self-supervised representation learning. We observe that only a very few modern components are critical for learning good representations, while many others are nonessential. Our study ultimately arrives at an approach that is highly simplified and to a large extent resembles a classical DAE. We hope our study will rekindle interest in a family of classical methods within the realm of modern self-supervised learning.

自己教師あり学習のためのデノイジング拡散モデルの分解

Deconstructing Denoising Diffusion Models for Self-Supervised Learning

要旨

Support