ChatPaper.aiChatPaper

nablaNABLA:邻域自适应块级注意力机制

nablaNABLA: Neighborhood Adaptive Block-Level Attention

July 17, 2025
作者: Dmitrii Mikhailov, Aleksey Letunovskiy, Maria Kovaleva, Vladimir Arkhipkin, Vladimir Korviakov, Vladimir Polovnikov, Viacheslav Vasilev, Evelina Sidorova, Denis Dimitrov
cs.AI

摘要

近期,基于Transformer架构的研究在视频生成任务中取得了显著成功。然而,全注意力机制的二次方复杂度仍是关键瓶颈,尤其在高分辨率和长时间视频序列处理时更为突出。本文提出NABLA,一种新颖的邻域自适应块级注意力机制,它能动态适应视频扩散变换器(DiTs)中的稀疏模式。通过采用自适应稀疏驱动阈值的块级注意力,NABLA在保持生成质量的同时降低了计算开销。我们的方法无需定制底层算子设计,并能无缝集成于PyTorch的Flex Attention算子。实验表明,与基线相比,NABLA实现了高达2.7倍的训练与推理加速,几乎未在量化指标(CLIP分数、VBench分数、人类评估分数)和视觉质量上做出妥协。代码及模型权重已公开于:https://github.com/gen-ai-team/Wan2.1-NABLA。
English
Recent progress in transformer-based architectures has demonstrated remarkable success in video generation tasks. However, the quadratic complexity of full attention mechanisms remains a critical bottleneck, particularly for high-resolution and long-duration video sequences. In this paper, we propose NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality. Our method does not require custom low-level operator design and can be seamlessly integrated with PyTorch's Flex Attention operator. Experiments demonstrate that NABLA achieves up to 2.7x faster training and inference compared to baseline almost without compromising quantitative metrics (CLIP score, VBench score, human evaluation score) and visual quality drop. The code and model weights are available here: https://github.com/gen-ai-team/Wan2.1-NABLA
PDF1173July 25, 2025