アンビエント・ディフュージョン・オムニ：不良データを用いた良質なモデルのトレーニング

要旨

低品質、合成、および分布外の画像を活用して拡散モデルの品質を向上させる方法を示す。通常、拡散モデルはウェブやその他のソースから高度にフィルタリングされたデータプールから得られるキュレーションデータセットで訓練される。我々は、しばしば破棄される低品質画像に大きな価値があることを示す。Ambient Diffusion Omniを提案し、訓練中に利用可能な全ての画像から信号を抽出できる拡散モデルを訓練するためのシンプルで原理に基づいたフレームワークを提供する。このフレームワークは、自然画像の2つの特性――スペクトルパワー則減衰と局所性――を活用する。まず、ガウシアンブラー、JPEG圧縮、モーションブラーによって人工的に劣化させた画像を用いて拡散モデルを成功裏に訓練することで、このフレームワークを検証する。次に、このフレームワークを用いてImageNet FIDにおいて最先端の結果を達成し、テキストから画像への生成モデリングにおいて画像品質と多様性の両方で大幅な改善を示す。核心となる洞察は、ノイズが望まれる高品質分布と実際に観測される混合分布との間の初期の歪みを緩和するということである。拡散時間にわたる偏ったデータからの学習と限られた不偏データからの学習のトレードオフを分析することで、我々のアプローチに対する厳密な理論的正当化を提供する。

English

We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from all available images during training. Our framework exploits two properties of natural images -- spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We then use our framework to achieve state-of-the-art ImageNet FID, and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.

アンビエント・ディフュージョン・オムニ：不良データを用いた良質なモデルのトレーニング

Ambient Diffusion Omni: Training Good Models with Bad Data

要旨

Support