ZeroSmooth:无需训练的扩散器适应方法,用于高帧率视频生成
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation
June 3, 2024
作者: Shaoshu Yang, Yong Zhang, Xiaodong Cun, Ying Shan, Ran He
cs.AI
摘要
近年来,视频生成取得了显著进展,尤其是自视频扩散模型问世以来。许多视频生成模型能够生成逼真的合成视频,例如稳定视频扩散(SVD)。然而,由于有限的GPU内存以及对大量帧进行建模的困难,大多数视频模型只能生成低帧率视频。训练视频总是以指定间隔均匀采样以进行时间压缩。先前的方法通过在像素空间训练视频插值模型作为后处理阶段,或者针对特定基础视频模型在潜在空间训练插值模型来提升帧率。本文提出了一种无需训练的视频插值方法,适用于生成式视频扩散模型,并可通用地应用于不同模型。我们研究了视频扩散模型特征空间中的非线性,并将视频模型转换为自级联视频扩散模型,并融入设计的隐藏状态校正模块。自级联架构和校正模块被提出以保持关键帧和插值帧之间的时间一致性。我们对多个流行视频模型进行了广泛评估,以展示所提出方法的有效性,特别是我们的无需训练方法甚至与由大量计算资源和大规模数据集支持的训练插值模型相媲美。
English
Video generation has made remarkable progress in recent years, especially
since the advent of the video diffusion models. Many video generation models
can produce plausible synthetic videos, e.g., Stable Video Diffusion (SVD).
However, most video models can only generate low frame rate videos due to the
limited GPU memory as well as the difficulty of modeling a large set of frames.
The training videos are always uniformly sampled at a specified interval for
temporal compression. Previous methods promote the frame rate by either
training a video interpolation model in pixel space as a postprocessing stage
or training an interpolation model in latent space for a specific base video
model. In this paper, we propose a training-free video interpolation method for
generative video diffusion models, which is generalizable to different models
in a plug-and-play manner. We investigate the non-linearity in the feature
space of video diffusion models and transform a video model into a
self-cascaded video diffusion model with incorporating the designed hidden
state correction modules. The self-cascaded architecture and the correction
module are proposed to retain the temporal consistency between key frames and
the interpolated frames. Extensive evaluations are preformed on multiple
popular video models to demonstrate the effectiveness of the propose method,
especially that our training-free method is even comparable to trained
interpolation models supported by huge compute resources and large-scale
datasets.Summary
AI-Generated Summary