重注意力机制：通过注意力统计重塑实现超稀疏视觉生成

摘要

扩散变换器（DiT）已成为生成高质量视觉内容（如视频和图像）的实际标准模型。一个巨大的瓶颈在于注意力机制，其复杂度随分辨率和视频长度呈二次方增长。减轻这一负担的一种合理方法是稀疏注意力，即仅计算部分标记或图像块。然而，现有技术在极高稀疏度下无法保持视觉质量，甚至可能带来不可忽视的计算开销。为解决这一问题，我们提出了Re-ttention，它利用扩散模型的时间冗余性，在视觉生成模型中实现极高稀疏度的注意力机制，以克服注意力机制内的概率归一化偏移。具体而言，Re-ttention根据先前的softmax分布历史重塑注意力分数，从而在极高稀疏度下保持全二次方注意力的视觉质量。在CogVideoX和PixArt DiTs等T2V/T2I模型上的实验结果表明，Re-ttention在推理时仅需3.1%的标记，优于FastDiTAttn、Sparse VideoGen和MInference等当代方法。此外，我们通过延迟测量表明，在H100 GPU上，我们的方法能以可忽略的开销实现超过45%的端到端延迟减少和超过92%的自注意力延迟减少。代码可在线获取： https://github.com/cccrrrccc/Re-ttention{https://github.com/cccrrrccc/Re-ttention}

English

Diffusion Transformers (DiT) have become the de-facto model for generating high-quality visual content like videos and images. A huge bottleneck is the attention mechanism where complexity scales quadratically with resolution and video length. One logical way to lessen this burden is sparse attention, where only a subset of tokens or patches are included in the calculation. However, existing techniques fail to preserve visual quality at extremely high sparsity levels and might even incur non-negligible compute overheads. % To address this concern, we propose Re-ttention, which implements very high sparse attention for visual generation models by leveraging the temporal redundancy of Diffusion Models to overcome the probabilistic normalization shift within the attention mechanism. Specifically, Re-ttention reshapes attention scores based on the prior softmax distribution history in order to preserve the visual quality of the full quadratic attention at very high sparsity levels. % Experimental results on T2V/T2I models such as CogVideoX and the PixArt DiTs demonstrate that Re-ttention requires as few as 3.1\% of the tokens during inference, outperforming contemporary methods like FastDiTAttn, Sparse VideoGen and MInference. Further, we measure latency to show that our method can attain over 45\% end-to-end % and over 92\% self-attention latency reduction on an H100 GPU at negligible overhead cost. Code available online here: https://github.com/cccrrrccc/Re-ttention{https://github.com/cccrrrccc/Re-ttention}