视频创作者1:用于高质量视频生成的开放扩散模型
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
October 30, 2023
作者: Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan
cs.AI
摘要
视频生成在学术界和工业界越来越受到关注。尽管商业工具可以生成合理的视频,但对于研究人员和工程师来说,开源模型的数量有限。在这项工作中,我们介绍了两种用于高质量视频生成的扩散模型,即文本到视频(T2V)模型和图像到视频(I2V)模型。T2V模型根据给定的文本输入合成视频,而I2V模型则包含额外的图像输入。我们提出的T2V模型可以生成分辨率为1024乘576的逼真且具有电影质量的视频,质量方面优于其他开源T2V模型。I2V模型旨在生成严格遵循所提供参考图像内容的视频,保留其内容、结构和风格。该模型是第一个开源I2V基础模型,能够将给定图像转换为视频片段,同时保持内容保留约束。我们相信这些开源视频生成模型将为社区内的技术进步做出重大贡献。
English
Video generation has increasingly gained interest in both academia and
industry. Although commercial tools can generate plausible videos, there is a
limited number of open-source models available for researchers and engineers.
In this work, we introduce two diffusion models for high-quality video
generation, namely text-to-video (T2V) and image-to-video (I2V) models. T2V
models synthesize a video based on a given text input, while I2V models
incorporate an additional image input. Our proposed T2V model can generate
realistic and cinematic-quality videos with a resolution of 1024 times 576,
outperforming other open-source T2V models in terms of quality. The I2V model
is designed to produce videos that strictly adhere to the content of the
provided reference image, preserving its content, structure, and style. This
model is the first open-source I2V foundation model capable of transforming a
given image into a video clip while maintaining content preservation
constraints. We believe that these open-source video generation models will
contribute significantly to the technological advancements within the
community.