ChatPaper.aiChatPaper

金字塔注意力广播实现的实时视频生成

Real-Time Video Generation with Pyramid Attention Broadcast

August 22, 2024
作者: Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You
cs.AI

摘要

我们提出了金字塔注意力广播(PAB),这是一种基于DiT的视频生成的实时、高质量且无需训练的方法。我们的方法基于这样一个观察:扩散过程中的注意力差异呈现出U形模式,表明存在显著的冗余性。我们通过以金字塔样式将注意力输出广播到后续步骤来缓解这一问题。针对每种基于注意力的广播,我们应用不同的广播策略以获得最佳效率,根据它们的方差进行调整。我们进一步引入了广播序列并行以实现更高效的分布式推理。与基准模型相比,PAB在三个模型上展现出卓越的结果,实现了高达720p视频的实时生成。我们期待,我们这种简单而有效的方法将作为一个稳健的基准,并促进未来视频生成研究和应用。
English
We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for DiT-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different broadcast strategies to each attention based on their variance for best efficiency. We further introduce broadcast sequence parallel for more efficient distributed inference. PAB demonstrates superior results across three models compared to baselines, achieving real-time generation for up to 720p videos. We anticipate that our simple yet effective method will serve as a robust baseline and facilitate future research and application for video generation.

Summary

AI-Generated Summary

PDF172November 16, 2024