ChatPaper.aiChatPaper

无需训练的高效视频生成:基于动态令牌雕刻技术

Training-Free Efficient Video Generation via Dynamic Token Carving

May 22, 2025
作者: Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia
cs.AI

摘要

尽管视频扩散变换器(DiT)模型展现出卓越的生成质量,但其实际部署却因庞大的计算需求而严重受限。这种低效性源于两大挑战:自注意力机制相对于令牌长度的二次方复杂度,以及扩散模型的多步特性。为应对这些局限,我们提出了Jenga,一种创新的推理流程,它结合了动态注意力裁剪与渐进分辨率生成。我们的方法基于两个关键洞见:(1)早期去噪步骤无需高分辨率潜在空间,(2)后期步骤则无需密集注意力。Jenga引入了一种基于块状注意力的机制,通过3D空间填充曲线动态选择相关令牌交互,同时采用渐进分辨率策略,在生成过程中逐步提升潜在空间的分辨率。实验结果显示,Jenga在多个顶尖视频扩散模型上实现了显著的加速,同时保持了可比的生成质量(在VBench上达到8.83倍加速,性能仅下降0.01%)。作为即插即用的解决方案,Jenga通过将推理时间从分钟级缩短至秒级,使得在现代硬件上实现高质量视频生成成为可能——且无需重新训练模型。代码地址:https://github.com/dvlab-research/Jenga
English
Despite the remarkable generation quality of video Diffusion Transformer (DiT) models, their practical deployment is severely hindered by extensive computational requirements. This inefficiency stems from two key challenges: the quadratic complexity of self-attention with respect to token length and the multi-step nature of diffusion models. To address these limitations, we present Jenga, a novel inference pipeline that combines dynamic attention carving with progressive resolution generation. Our approach leverages two key insights: (1) early denoising steps do not require high-resolution latents, and (2) later steps do not require dense attention. Jenga introduces a block-wise attention mechanism that dynamically selects relevant token interactions using 3D space-filling curves, alongside a progressive resolution strategy that gradually increases latent resolution during generation. Experimental results demonstrate that Jenga achieves substantial speedups across multiple state-of-the-art video diffusion models while maintaining comparable generation quality (8.83times speedup with 0.01\% performance drop on VBench). As a plug-and-play solution, Jenga enables practical, high-quality video generation on modern hardware by reducing inference time from minutes to seconds -- without requiring model retraining. Code: https://github.com/dvlab-research/Jenga

Summary

AI-Generated Summary

PDF122May 23, 2025