Latenti Unificati (UL): Come addestrare i tuoi latenti

Abstract

Presentiamo Unified Latents (UL), un framework per apprendere rappresentazioni latenti regolarizzate congiuntamente da un prior di diffusione e decodificate da un modello di diffusione. Collegando il rumore di output dell'encoder al livello di rumore minimo del prior, otteniamo una semplice funzione di addestramento che fornisce un limite superiore stretto sul bitrate latente. Su ImageNet-512, il nostro approccio raggiunge un FID competitivo di 1.4, con un'elevata qualità di ricostruzione (PSNR) richiedendo al contempo meno FLOP di addestramento rispetto ai modelli addestrati sui latenti di Stable Diffusion. Su Kinetics-600, stabiliamo un nuovo stato dell'arte con FVD di 1.3.

English

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.

Latenti Unificati (UL): Come addestrare i tuoi latenti

Unified Latents (UL): How to train your latents

Abstract

Support