以其自身的壞版本引導擴散模型
Guiding a Diffusion Model with a Bad Version of Itself
June 4, 2024
作者: Tero Karras, Miika Aittala, Tuomas Kynkäänniemi, Jaakko Lehtinen, Timo Aila, Samuli Laine
cs.AI
摘要
在生成圖像的擴散模型中,主要關注的軸線是圖像品質、結果變化的量以及結果與給定條件的對齊程度,例如類別標籤或文本提示。流行的無分類器指導方法使用無條件模型來引導有條件模型,從而同時實現更好的提示對齊和更高品質的圖像,但代價是減少變化。這些效應似乎固有地交織在一起,因此難以控制。我們發現令人驚訝的是,通過使用模型本身的較小、訓練較少的版本來引導生成,可以獲得對圖像品質的解耦控制,而不會犧牲變化量。這對於 ImageNet 生成帶來了顯著的改進,在使用公開可用網絡時,64x64 的 FID 為 1.01,512x512 的 FID 為 1.25,創下了紀錄。此外,這種方法也適用於無條件擴散模型,顯著提高了它們的品質。
English
The primary axes of interest in image-generating diffusion models are image
quality, the amount of variation in the results, and how well the results align
with a given condition, e.g., a class label or a text prompt. The popular
classifier-free guidance approach uses an unconditional model to guide a
conditional model, leading to simultaneously better prompt alignment and
higher-quality images at the cost of reduced variation. These effects seem
inherently entangled, and thus hard to control. We make the surprising
observation that it is possible to obtain disentangled control over image
quality without compromising the amount of variation by guiding generation
using a smaller, less-trained version of the model itself rather than an
unconditional model. This leads to significant improvements in ImageNet
generation, setting record FIDs of 1.01 for 64x64 and 1.25 for 512x512, using
publicly available networks. Furthermore, the method is also applicable to
unconditional diffusion models, drastically improving their quality.Summary
AI-Generated Summary