ChatPaper.aiChatPaper

自适应引导:无需训练的条件扩散模型加速

Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models

December 19, 2023
作者: Angela Castillo, Jonas Kohler, Juan C. Pérez, Juan Pablo Pérez, Albert Pumarola, Bernard Ghanem, Pablo Arbeláez, Ali Thabet
cs.AI

摘要

本文从推理效率的角度全面研究了分类器自由引导(CFG)在文本条件扩散模型中的作用。具体来说,我们放宽了在所有扩散步骤中应用CFG的默认选择,而是寻找高效的引导策略。我们在可微分神经架构搜索框架中制定了发现这种策略的方法。我们的研究表明,CFG提出的去噪步骤越来越与简单的条件步骤一致,这使得CFG的额外神经网络评估在去噪过程的后半部分变得多余。基于这一发现,我们提出了“自适应引导”(AG),这是CFG的一种高效变体,当去噪过程显示收敛时,自适应地省略网络评估。我们的实验表明,AG在减少计算量的同时保持了CFG的图像质量,减少了25%的计算量。因此,AG是Guidance Distillation的即插即用替代方案,实现了后者速度提升的50%,同时无需训练,保留了处理负提示的能力。最后,我们揭示了CFG在扩散过程的前半部分进一步冗余,表明整个神经功能评估可以被过去得分估计的简单仿射变换所取代。这种方法被称为LinearAG,提供了更便宜的推理,但会偏离基线模型。我们的研究结果揭示了条件去噪过程的效率,有助于更实用和快速部署文本条件扩散模型。
English
This paper presents a comprehensive study on the role of Classifier-Free Guidance (CFG) in text-conditioned diffusion models from the perspective of inference efficiency. In particular, we relax the default choice of applying CFG in all diffusion steps and instead search for efficient guidance policies. We formulate the discovery of such policies in the differentiable Neural Architecture Search framework. Our findings suggest that the denoising steps proposed by CFG become increasingly aligned with simple conditional steps, which renders the extra neural network evaluation of CFG redundant, especially in the second half of the denoising process. Building upon this insight, we propose "Adaptive Guidance" (AG), an efficient variant of CFG, that adaptively omits network evaluations when the denoising process displays convergence. Our experiments demonstrate that AG preserves CFG's image quality while reducing computation by 25%. Thus, AG constitutes a plug-and-play alternative to Guidance Distillation, achieving 50% of the speed-ups of the latter while being training-free and retaining the capacity to handle negative prompts. Finally, we uncover further redundancies of CFG in the first half of the diffusion process, showing that entire neural function evaluations can be replaced by simple affine transformations of past score estimates. This method, termed LinearAG, offers even cheaper inference at the cost of deviating from the baseline model. Our findings provide insights into the efficiency of the conditional denoising process that contribute to more practical and swift deployment of text-conditioned diffusion models.
PDF100December 15, 2024