自適應引導:無需訓練的條件擴散模型加速
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models
December 19, 2023
作者: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet
cs.AI
摘要
本文從推理效率的角度全面研究了在文本條件擴散模型中分類器自由引導(CFG)的作用。具體而言,我們放寬了在所有擴散步驟中應用CFG的默認選擇,而是尋找高效引導策略。我們在可微分神經架構搜索框架中制定了發現此類策略的方法。我們的研究結果表明,CFG提出的去噪步驟越來越與簡單的條件步驟保持一致,這使得CFG的額外神經網絡評估在去噪過程的後半部分特別是多餘的。基於這一見解,我們提出了“自適應引導”(AG),這是CFG的一種高效變體,當去噪過程顯示收斂時,自適應地省略網絡評估。我們的實驗表明,AG在減少計算量的同時保留了CFG的圖像質量,計算節省了25%。因此,AG是Guidance Distillation的即插即用替代方案,實現了後者的50%加速,同時無需訓練,並保留處理負提示的能力。最後,我們揭示了CFG在擴散過程的前半部分進一步冗餘,顯示整個神經功能評估可以被過去分數估計的簡單仿射變換所取代。這稱為LinearAG的方法提供了更便宜的推理,但與基準模型有所偏離。我們的研究結果揭示了條件去噪過程效率的見解,有助於更實用和迅速部署文本條件擴散模型。
English
This paper presents a comprehensive study on the role of Classifier-Free
Guidance (CFG) in text-conditioned diffusion models from the perspective of
inference efficiency. In particular, we relax the default choice of applying
CFG in all diffusion steps and instead search for efficient guidance policies.
We formulate the discovery of such policies in the differentiable Neural
Architecture Search framework. Our findings suggest that the denoising steps
proposed by CFG become increasingly aligned with simple conditional steps,
which renders the extra neural network evaluation of CFG redundant, especially
in the second half of the denoising process. Building upon this insight, we
propose "Adaptive Guidance" (AG), an efficient variant of CFG, that adaptively
omits network evaluations when the denoising process displays convergence. Our
experiments demonstrate that AG preserves CFG's image quality while reducing
computation by 25%. Thus, AG constitutes a plug-and-play alternative to
Guidance Distillation, achieving 50% of the speed-ups of the latter while being
training-free and retaining the capacity to handle negative prompts. Finally,
we uncover further redundancies of CFG in the first half of the diffusion
process, showing that entire neural function evaluations can be replaced by
simple affine transformations of past score estimates. This method, termed
LinearAG, offers even cheaper inference at the cost of deviating from the
baseline model. Our findings provide insights into the efficiency of the
conditional denoising process that contribute to more practical and swift
deployment of text-conditioned diffusion models.