ChatPaper.aiChatPaper

無需訓練的高效視頻生成:基於動態令牌雕刻技術

Training-Free Efficient Video Generation via Dynamic Token Carving

May 22, 2025
作者: Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia
cs.AI

摘要

儘管視頻擴散變換器(DiT)模型展現了卓越的生成質量,但其實際部署卻因龐大的計算需求而嚴重受阻。這種低效性源自於兩個關鍵挑戰:自注意力機制相對於令牌長度的二次方複雜性,以及擴散模型的多步特性。為解決這些限制,我們提出了Jenga,一種結合了動態注意力雕刻與漸進分辨率生成的新型推理管道。我們的方法基於兩個關鍵洞察:(1) 早期去噪步驟無需高分辨率潛在特徵,(2) 後期步驟無需密集注意力。Jenga引入了一種基於塊的注意力機制,該機制利用3D空間填充曲線動態選擇相關的令牌交互,同時採用漸進分辨率策略,在生成過程中逐步提升潛在分辨率。實驗結果表明,Jenga在多個最先進的視頻擴散模型上實現了顯著的加速,同時保持了可比的生成質量(在VBench上實現了8.83倍的加速,性能僅下降0.01%)。作為即插即用的解決方案,Jenga通過將推理時間從分鐘級縮短至秒級,使得在現代硬件上實現高質量視頻生成成為可能——且無需模型重新訓練。代碼見:https://github.com/dvlab-research/Jenga
English
Despite the remarkable generation quality of video Diffusion Transformer (DiT) models, their practical deployment is severely hindered by extensive computational requirements. This inefficiency stems from two key challenges: the quadratic complexity of self-attention with respect to token length and the multi-step nature of diffusion models. To address these limitations, we present Jenga, a novel inference pipeline that combines dynamic attention carving with progressive resolution generation. Our approach leverages two key insights: (1) early denoising steps do not require high-resolution latents, and (2) later steps do not require dense attention. Jenga introduces a block-wise attention mechanism that dynamically selects relevant token interactions using 3D space-filling curves, alongside a progressive resolution strategy that gradually increases latent resolution during generation. Experimental results demonstrate that Jenga achieves substantial speedups across multiple state-of-the-art video diffusion models while maintaining comparable generation quality (8.83times speedup with 0.01\% performance drop on VBench). As a plug-and-play solution, Jenga enables practical, high-quality video generation on modern hardware by reducing inference time from minutes to seconds -- without requiring model retraining. Code: https://github.com/dvlab-research/Jenga

Summary

AI-Generated Summary

PDF122May 23, 2025