ChatPaper.aiChatPaper

统一潜变量(UL):如何训练你的潜变量

Unified Latents (UL): How to train your latents

February 19, 2026
作者: Jonathan Heek, Emiel Hoogeboom, Thomas Mensink, Tim Salimans
cs.AI

摘要

我们提出统一隐变量(UL)框架,该框架通过扩散先验联合正则化学习隐表示,并利用扩散模型进行解码。通过将编码器输出噪声与先验最小噪声水平关联,我们获得了简化的训练目标,可为隐变量比特率提供严格上界。在ImageNet-512数据集上,我们的方法实现了1.4的竞争性FID分数,并具备高重建质量(PSNR),同时训练所需FLOPs低于基于Stable Diffusion隐变量训练的模型。在Kinetics-600数据集上,我们以1.3的FVD创造了新的最优性能纪录。
English
We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.
PDF212February 21, 2026