ChatPaper.aiChatPaper

双流扩散网络在文本到视频生成中的应用

Dual-Stream Diffusion Net for Text-to-Video Generation

August 16, 2023
作者: Binhui Liu, Xin Liu, Anbo Dai, Zhiyong Zeng, Zhen Cui, Jian Yang
cs.AI

摘要

随着扩散模型的兴起,文本到视频生成技术近期备受关注。但该领域存在一个重要瓶颈:生成视频常出现闪烁伪影和失真现象。本研究提出双流扩散网络(DSDN),通过增强内容变化的连贯性来提升视频生成质量。特别设计的视频内容流与运动流不仅能在各自独立空间运行,分别生成个性化视频内容及动态变化,还通过我们设计的跨Transformer交互模块实现内容域与运动域的对齐,从而提升生成视频的流畅度。此外,我们引入运动分解器与组合器来优化视频运动处理。定性与定量实验表明,本方法能生成令人惊叹的连续视频,且显著减少闪烁现象。
English
With the emerging diffusion models, recently, text-to-video generation has aroused increasing attention. But an important bottleneck therein is that generative videos often tend to carry some flickers and artifacts. In this work, we propose a dual-stream diffusion net (DSDN) to improve the consistency of content variations in generating videos. In particular, the designed two diffusion streams, video content and motion branches, could not only run separately in their private spaces for producing personalized video variations as well as content, but also be well-aligned between the content and motion domains through leveraging our designed cross-transformer interaction module, which would benefit the smoothness of generated videos. Besides, we also introduce motion decomposer and combiner to faciliate the operation on video motion. Qualitative and quantitative experiments demonstrate that our method could produce amazing continuous videos with fewer flickers.
PDF253April 9, 2026