去噪重用:利用帧间运动一致性进行高效视频潜在生成
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation
September 19, 2024
作者: Chenyu Wang, Shuo Yan, Yixuan Chen, Yujiang Wang, Mingzhi Dong, Xiaochen Yang, Dongsheng Li, Robert P. Dick, Qin Lv, Fan Yang, Tun Lu, Ning Gu, Li Shang
cs.AI
摘要
基于扩散的模型进行视频生成受到高计算成本的限制,因为需要逐帧进行迭代扩散过程。本研究提出了一种名为Diffusion Reuse MOtion(Dr. Mo)网络,用于加速潜在视频生成。我们的关键发现是,在较早的去噪步骤中存在粗粒度噪声,这些噪声在连续视频帧之间表现出高运动一致性。基于这一观察,Dr. Mo通过结合精心设计的轻量级帧间运动,将这些粗粒度噪声传播到下一帧,从而消除了逐帧扩散模型中的大量计算冗余。更敏感和细粒度的噪声仍然通过后续的去噪步骤获取,这对保留视觉质量至关重要。因此,决定哪些中间步骤应该从基于运动的传播切换到去噪可能是一个关键问题,也是效率和质量之间的关键权衡。Dr. Mo采用一个名为Denoising Step Selector(DSS)的元网络,动态确定视频帧之间理想的中间步骤。对视频生成和编辑任务的广泛评估表明,Dr. Mo能够显著加速视频任务中的扩散模型,并提高视觉质量。
English
Video generation using diffusion-based models is constrained by high
computational costs due to the frame-wise iterative diffusion process. This
work presents a Diffusion Reuse MOtion (Dr. Mo) network to accelerate latent
video generation. Our key discovery is that coarse-grained noises in earlier
denoising steps have demonstrated high motion consistency across consecutive
video frames. Following this observation, Dr. Mo propagates those
coarse-grained noises onto the next frame by incorporating carefully designed,
lightweight inter-frame motions, eliminating massive computational redundancy
in frame-wise diffusion models. The more sensitive and fine-grained noises are
still acquired via later denoising steps, which can be essential to retain
visual qualities. As such, deciding which intermediate steps should switch from
motion-based propagations to denoising can be a crucial problem and a key
tradeoff between efficiency and quality. Dr. Mo employs a meta-network named
Denoising Step Selector (DSS) to dynamically determine desirable intermediate
steps across video frames. Extensive evaluations on video generation and
editing tasks have shown that Dr. Mo can substantially accelerate diffusion
models in video tasks with improved visual qualities.Summary
AI-Generated Summary