nablaNABLA: Buurtadaptieve Blokniveau Aandacht

Samenvatting

Recente vooruitgang in transformer-gebaseerde architecturen heeft opmerkelijke successen laten zien bij videogeneratietaken. De kwadratische complexiteit van volledige aandachtmechanismen blijft echter een kritieke bottleneck, met name voor hoogwaardige en langdurige videosequenties. In dit artikel stellen we NABLA voor, een nieuw Neighborhood Adaptive Block-Level Attention-mechanisme dat zich dynamisch aanpast aan sparsitypatronen in videodiffusietransformers (DiTs). Door gebruik te maken van bloksgewijze aandacht met een adaptieve sparsity-gestuurde drempel, vermindert NABLA de rekenkosten terwijl de generatieve kwaliteit behouden blijft. Onze methode vereist geen aangepast ontwerp van low-level operatoren en kan naadloos worden geïntegreerd met PyTorch's Flex Attention-operator. Experimenten tonen aan dat NABLA tot 2,7x snellere training en inferentie bereikt in vergelijking met de baseline, bijna zonder in te leveren op kwantitatieve metrieken (CLIP-score, VBench-score, menselijke evaluatiescore) en visuele kwaliteit. De code en modelgewichten zijn hier beschikbaar: https://github.com/gen-ai-team/Wan2.1-NABLA.

English

Recent progress in transformer-based architectures has demonstrated remarkable success in video generation tasks. However, the quadratic complexity of full attention mechanisms remains a critical bottleneck, particularly for high-resolution and long-duration video sequences. In this paper, we propose NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality. Our method does not require custom low-level operator design and can be seamlessly integrated with PyTorch's Flex Attention operator. Experiments demonstrate that NABLA achieves up to 2.7x faster training and inference compared to baseline almost without compromising quantitative metrics (CLIP score, VBench score, human evaluation score) and visual quality drop. The code and model weights are available here: https://github.com/gen-ai-team/Wan2.1-NABLA

nablaNABLA: Buurtadaptieve Blokniveau Aandacht

nablaNABLA: Neighborhood Adaptive Block-Level Attention

Samenvatting

Support