ChatPaper.aiChatPaper

视频控制:无需训练的可控文本到视频生成

ControlVideo: Training-free Controllable Text-to-Video Generation

May 22, 2023
作者: Yabo Zhang, Yuxiang Wei, Dongsheng Jiang, Xiaopeng Zhang, Wangmeng Zuo, Qi Tian
cs.AI

摘要

基于文本驱动的扩散模型在图像生成方面取得了前所未有的能力,而其视频对应物仍然落后,这是因为时间建模的训练成本过高。除了训练负担之外,生成的视频还存在外观不一致和结构闪烁的问题,尤其是在长视频合成中。为了解决这些挑战,我们设计了一个名为ControlVideo的无需训练的框架,以实现自然高效的文本到视频生成。ControlVideo源自ControlNet,利用输入运动序列的粗略结构一致性,并引入三个模块来改进视频生成。首先,为了确保帧间外观一致性,ControlVideo在自注意力模块中添加了完全的帧间交互。其次,为了减轻闪烁效应,它引入了一个交错帧平滑器,对交替帧进行帧插值。最后,为了高效生成长视频,它利用分层采样器分别合成每个具有整体连贯性的短视频片段。凭借这些模块的支持,ControlVideo在广泛的运动提示对上在定量和定性上均优于现有技术。值得注意的是,由于高效的设计,它可以在几分钟内使用一块NVIDIA 2080Ti生成短视频和长视频。代码可在https://github.com/YBYBZhang/ControlVideo获取。
English
Text-driven diffusion models have unlocked unprecedented abilities in image generation, whereas their video counterpart still lags behind due to the excessive training cost of temporal modeling. Besides the training burden, the generated videos also suffer from appearance inconsistency and structural flickers, especially in long video synthesis. To address these challenges, we design a training-free framework called ControlVideo to enable natural and efficient text-to-video generation. ControlVideo, adapted from ControlNet, leverages coarsely structural consistency from input motion sequences, and introduces three modules to improve video generation. Firstly, to ensure appearance coherence between frames, ControlVideo adds fully cross-frame interaction in self-attention modules. Secondly, to mitigate the flicker effect, it introduces an interleaved-frame smoother that employs frame interpolation on alternated frames. Finally, to produce long videos efficiently, it utilizes a hierarchical sampler that separately synthesizes each short clip with holistic coherency. Empowered with these modules, ControlVideo outperforms the state-of-the-arts on extensive motion-prompt pairs quantitatively and qualitatively. Notably, thanks to the efficient designs, it generates both short and long videos within several minutes using one NVIDIA 2080Ti. Code is available at https://github.com/YBYBZhang/ControlVideo.
PDF73December 15, 2024