用一个坏版本的扩散模型来引导自身
Guiding a Diffusion Model with a Bad Version of Itself
June 4, 2024
作者: Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, Samuli Laine
cs.AI
摘要
在生成图像的扩散模型中,主要关注的轴线是图像质量、结果中的变化量以及结果与给定条件(例如类标签或文本提示)的对齐程度。流行的无分类器指导方法使用无条件模型来指导有条件模型,从而同时实现更好的提示对齐和更高质量的图像,但会降低变化量。这些效果似乎固有地纠缠在一起,因此难以控制。我们发现令人惊讶的是,通过使用较小、训练较少的模型版本来引导生成,而不是无条件模型,可以实现对图像质量的解耦控制,而不会牺牲变化量。这导致在ImageNet生成方面取得了显著进展,使用公开可用的网络,为64x64的FID设置了1.01的记录,为512x512设置了1.25的记录。此外,该方法也适用于无条件扩散模型,大幅提升了它们的质量。
English
The primary axes of interest in image-generating diffusion models are image
quality, the amount of variation in the results, and how well the results align
with a given condition, e.g., a class label or a text prompt. The popular
classifier-free guidance approach uses an unconditional model to guide a
conditional model, leading to simultaneously better prompt alignment and
higher-quality images at the cost of reduced variation. These effects seem
inherently entangled, and thus hard to control. We make the surprising
observation that it is possible to obtain disentangled control over image
quality without compromising the amount of variation by guiding generation
using a smaller, less-trained version of the model itself rather than an
unconditional model. This leads to significant improvements in ImageNet
generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using
publicly available networks. Furthermore, the method is also applicable to
unconditional diffusion models, drastically improving their quality.Summary
AI-Generated Summary