ChatPaper.aiChatPaper

PLADIS:利用稀疏性在推理时突破扩散模型注意力机制的极限

PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity

March 10, 2025
作者: Kwanyoung Kim, Byeongsu Sim
cs.AI

摘要

扩散模型在利用无分类器引导(CFG)等技术生成高质量条件样本方面展现了令人瞩目的成果。然而,现有方法通常需要额外训练或神经函数评估(NFEs),这使得它们与引导蒸馏模型不兼容。此外,这些方法依赖于启发式策略,需识别目标层。在本研究中,我们提出了一种新颖且高效的方法,称为PLADIS,它通过利用稀疏注意力来增强预训练模型(U-Net/Transformer)。具体而言,我们在推理过程中,通过softmax及其稀疏版本在交叉注意力层外推查询-键相关性,无需额外训练或NFEs。借助稀疏注意力的噪声鲁棒性,我们的PLADIS释放了文本到图像扩散模型的潜在能力,使其在以往表现欠佳的领域展现出新的效能。该方法能够无缝集成包括引导蒸馏模型在内的各种引导技术。大量实验表明,在文本对齐度和人类偏好方面均有显著提升,提供了一种高效且普遍适用的解决方案。
English
Diffusion models have shown impressive results in generating high-quality conditional samples using guidance techniques such as Classifier-Free Guidance (CFG). However, existing methods often require additional training or neural function evaluations (NFEs), making them incompatible with guidance-distilled models. Also, they rely on heuristic approaches that need identifying target layers. In this work, we propose a novel and efficient method, termed PLADIS, which boosts pre-trained models (U-Net/Transformer) by leveraging sparse attention. Specifically, we extrapolate query-key correlations using softmax and its sparse counterpart in the cross-attention layer during inference, without requiring extra training or NFEs. By leveraging the noise robustness of sparse attention, our PLADIS unleashes the latent potential of text-to-image diffusion models, enabling them to excel in areas where they once struggled with newfound effectiveness. It integrates seamlessly with guidance techniques, including guidance-distilled models. Extensive experiments show notable improvements in text alignment and human preference, offering a highly efficient and universally applicable solution.

Summary

AI-Generated Summary

PDF842March 17, 2025