使用金字塔注意力廣播實現即時視頻生成。
Real-Time Video Generation with Pyramid Attention Broadcast
August 22, 2024
作者: Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You
cs.AI
摘要
我們提出金字塔關注廣播(PAB),這是一種實時、高質量且無需訓練的基於DiT的視頻生成方法。我們的方法建立在一個觀察基礎上,即擴散過程中的關注差異呈現U形模式,表明存在顯著的冗餘性。我們通過以金字塔風格將關注輸出廣播到後續步驟來緩解這一問題。根據它們的變異性,對每個基於關注的廣播應用不同的策略以獲得最佳效率。我們進一步引入了廣播序列並行以進行更有效的分佈式推理。與基準模型相比,PAB在三個模型中展示出優越的結果,實現了高達720p視頻的實時生成。我們預計我們這種簡單而有效的方法將作為一個堅固的基準,並促進未來視頻生成研究和應用。
English
We present Pyramid Attention Broadcast (PAB), a real-time, high quality and
training-free approach for DiT-based video generation. Our method is founded on
the observation that attention difference in the diffusion process exhibits a
U-shaped pattern, indicating significant redundancy. We mitigate this by
broadcasting attention outputs to subsequent steps in a pyramid style. It
applies different broadcast strategies to each attention based on their
variance for best efficiency. We further introduce broadcast sequence parallel
for more efficient distributed inference. PAB demonstrates superior results
across three models compared to baselines, achieving real-time generation for
up to 720p videos. We anticipate that our simple yet effective method will
serve as a robust baseline and facilitate future research and application for
video generation.Summary
AI-Generated Summary