環境擴散全能：以劣質數據訓練優質模型

摘要

我們展示了如何利用低質量、合成以及分佈外圖像來提升擴散模型的品質。通常，擴散模型是在經過精心篩選的數據集上進行訓練的，這些數據集來自網絡及其他來源的高度過濾數據池。我們揭示了那些常被捨棄的低質量圖像中蘊含的巨大價值。我們提出了Ambient Diffusion Omni，這是一個簡單而原則性的框架，用於訓練能夠在訓練過程中從所有可用圖像中提取信號的擴散模型。我們的框架利用了自然圖像的兩個特性——頻譜冪律衰減和局部性。我們首先通過成功訓練使用高斯模糊、JPEG壓縮和運動模糊合成破壞的圖像的擴散模型，驗證了我們的框架。隨後，我們運用該框架在ImageNet FID上達到了最先進的水平，並在文本到圖像生成建模中展示了圖像質量和多樣性的顯著提升。核心洞見在於，噪聲緩解了期望高質量分佈與我們實際觀測到的混合分佈之間的初始偏斜。我們通過分析在擴散時間內從有偏數據學習與有限無偏數據學習之間的權衡，為我們的方法提供了嚴謹的理論依據。

English

We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from all available images during training. Our framework exploits two properties of natural images -- spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We then use our framework to achieve state-of-the-art ImageNet FID, and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.

環境擴散全能：以劣質數據訓練優質模型

Ambient Diffusion Omni: Training Good Models with Bad Data

摘要

Support