PISA:分段稀疏注意力机制在高效扩散变换器中的智慧选择
PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers
February 1, 2026
作者: Haopeng Li, Shitong Shao, Wenliang Zhong, Zikai Zhou, Lichen Bai, Hui Xiong, Zeke Xie
cs.AI
摘要
扩散变换器是视频与图像生成的基础模型,但其效率受限于注意力机制的二次复杂度。虽然块稀疏注意力通过仅关注关键键值块来加速计算,但在高稀疏度下会因丢弃上下文信息而导致性能下降。本研究发现,非关键块的注意力分数具有分布稳定性,可被精确高效地近似而非直接丢弃,这一发现对稀疏注意力设计至关重要。基于此关键洞见,我们提出PISA——一种免训练的分段稀疏注意力机制,能以次二次复杂度覆盖完整注意力范围。与传统“保留或丢弃”范式直接舍弃非关键块信息不同,PISA引入创新的“精确或近似”策略:对关键块保持精确计算,同时通过分块泰勒展开高效近似其余部分。该设计使PISA能作为完整注意力的可靠代理,有效弥合速度与质量之间的差距。实验结果表明,在Wan2.1-14B和Hunyuan-Video模型上,PISA分别实现1.91倍和2.57倍加速,同时在稀疏注意力方法中始终保持最高质量。值得注意的是,即使在FLUX模型上进行图像生成,PISA也能实现1.2倍加速且不影响视觉质量。代码已开源:https://github.com/xie-lab-ml/piecewise-sparse-attention。
English
Diffusion Transformers are fundamental for video and image generation, but their efficiency is bottlenecked by the quadratic complexity of attention. While block sparse attention accelerates computation by attending only critical key-value blocks, it suffers from degradation at high sparsity by discarding context. In this work, we discover that attention scores of non-critical blocks exhibit distributional stability, allowing them to be approximated accurately and efficiently rather than discarded, which is essentially important for sparse attention design. Motivated by this key insight, we propose PISA, a training-free Piecewise Sparse Attention that covers the full attention span with sub-quadratic complexity. Unlike the conventional keep-or-drop paradigm that directly drop the non-critical block information, PISA introduces a novel exact-or-approximate strategy: it maintains exact computation for critical blocks while efficiently approximating the remainder through block-wise Taylor expansion. This design allows PISA to serve as a faithful proxy to full attention, effectively bridging the gap between speed and quality. Experimental results demonstrate that PISA achieves 1.91 times and 2.57 times speedups on Wan2.1-14B and Hunyuan-Video, respectively, while consistently maintaining the highest quality among sparse attention methods. Notably, even for image generation on FLUX, PISA achieves a 1.2 times acceleration without compromising visual quality. Code is available at: https://github.com/xie-lab-ml/piecewise-sparse-attention.