nablaNABLA：鄰域自適應塊級注意力機制

摘要

近期，基於Transformer架構的進展在視頻生成任務中展現了顯著的成功。然而，全注意力機制的二次方複雜度仍是一個關鍵瓶頸，特別是在高分辨率與長時間序列的視頻處理上。本文提出了一種新穎的鄰域自適應塊級注意力機制——NABLA，它能夠動態適應視頻擴散Transformer（DiTs）中的稀疏模式。通過利用帶有自適應稀疏驅動閾值的塊級注意力，NABLA在保持生成質量的同時，降低了計算開銷。我們的方法無需定製底層運算符設計，並能無縫集成於PyTorch的Flex Attention運算符中。實驗表明，與基線相比，NABLA在幾乎不影響定量指標（CLIP分數、VBench分數、人類評估分數）及視覺質量下降的情況下，實現了最高達2.7倍的訓練與推理速度提升。代碼及模型權重可在此獲取：https://github.com/gen-ai-team/Wan2.1-NABLA。

English

Recent progress in transformer-based architectures has demonstrated remarkable success in video generation tasks. However, the quadratic complexity of full attention mechanisms remains a critical bottleneck, particularly for high-resolution and long-duration video sequences. In this paper, we propose NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality. Our method does not require custom low-level operator design and can be seamlessly integrated with PyTorch's Flex Attention operator. Experiments demonstrate that NABLA achieves up to 2.7x faster training and inference compared to baseline almost without compromising quantitative metrics (CLIP score, VBench score, human evaluation score) and visual quality drop. The code and model weights are available here: https://github.com/gen-ai-team/Wan2.1-NABLA

nablaNABLA：鄰域自適應塊級注意力機制

nablaNABLA: Neighborhood Adaptive Block-Level Attention

摘要

Support