金字塔注意力广播实现的实时视频生成
Real-Time Video Generation with Pyramid Attention Broadcast
August 22, 2024
作者: Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You
cs.AI
摘要
我们提出了金字塔注意力广播(PAB),这是一种基于DiT的视频生成的实时、高质量且无需训练的方法。我们的方法基于这样一个观察:扩散过程中的注意力差异呈现出U形模式,表明存在显著的冗余性。我们通过以金字塔样式将注意力输出广播到后续步骤来缓解这一问题。针对每种基于注意力的广播,我们应用不同的广播策略以获得最佳效率,根据它们的方差进行调整。我们进一步引入了广播序列并行以实现更高效的分布式推理。与基准模型相比,PAB在三个模型上展现出卓越的结果,实现了高达720p视频的实时生成。我们期待,我们这种简单而有效的方法将作为一个稳健的基准,并促进未来视频生成研究和应用。
English
We present Pyramid Attention Broadcast (PAB), a real-time, high quality and
training-free approach for DiT-based video generation. Our method is founded on
the observation that attention difference in the diffusion process exhibits a
U-shaped pattern, indicating significant redundancy. We mitigate this by
broadcasting attention outputs to subsequent steps in a pyramid style. It
applies different broadcast strategies to each attention based on their
variance for best efficiency. We further introduce broadcast sequence parallel
for more efficient distributed inference. PAB demonstrates superior results
across three models compared to baselines, achieving real-time generation for
up to 720p videos. We anticipate that our simple yet effective method will
serve as a robust baseline and facilitate future research and application for
video generation.Summary
AI-Generated Summary