Ambient Diffusion Omni: Addestrare Modelli di Qualità con Dati Scadenti

Abstract

Dimostriamo come utilizzare immagini di bassa qualità, sintetiche e fuori distribuzione per migliorare la qualità di un modello di diffusione. Tipicamente, i modelli di diffusione vengono addestrati su dataset curati che derivano da pool di dati altamente filtrati provenienti dal Web e da altre fonti. Mostriamo che esiste un valore immenso nelle immagini di qualità inferiore che spesso vengono scartate. Presentiamo Ambient Diffusion Omni, un framework semplice e principiato per addestrare modelli di diffusione in grado di estrarre segnali da tutte le immagini disponibili durante l'addestramento. Il nostro framework sfrutta due proprietà delle immagini naturali: il decadimento spettrale secondo la legge di potenza e la località. Validiamo inizialmente il nostro framework addestrando con successo modelli di diffusione utilizzando immagini sinteticamente corrotte da sfocatura gaussiana, compressione JPEG e sfocatura da movimento. Utilizziamo poi il nostro framework per raggiungere uno stato dell'arte in termini di FID su ImageNet, mostrando miglioramenti significativi sia nella qualità che nella diversità delle immagini per la modellazione generativa testo-immagine. L'intuizione centrale è che il rumore attenua lo skew iniziale tra la distribuzione di alta qualità desiderata e la distribuzione mista che osserviamo effettivamente. Forniamo una giustificazione teorica rigorosa del nostro approccio analizzando il trade-off tra l'apprendimento da dati distorti e dati non distorti limitati attraverso i tempi di diffusione.

English

We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from all available images during training. Our framework exploits two properties of natural images -- spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We then use our framework to achieve state-of-the-art ImageNet FID, and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.

Ambient Diffusion Omni: Addestrare Modelli di Qualità con Dati Scadenti

Ambient Diffusion Omni: Training Good Models with Bad Data

Abstract

Support