Vivid-ZOO:采用扩散模型进行多视角视频生成
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
June 12, 2024
作者: Bing Li, Cheng Zheng, Wenxuan Zhu, Jinjie Mai, Biao Zhang, Peter Wonka, Bernard Ghanem
cs.AI
摘要
虽然扩散模型在二维图像/视频生成方面表现出色,基于扩散的文本到多视角视频(T2MVid)生成仍未得到充分探索。T2MVid生成带来的新挑战在于缺乏大规模带字幕的多视角视频以及对这种多维分布进行建模的复杂性。为此,我们提出了一种新颖的基于扩散的流程,该流程生成以文本为中心的高质量多视角视频,围绕动态的三维对象。具体而言,我们将T2MVid问题分解为视角空间和时间组件。这种分解使我们能够结合和重复使用先进的预训练多视角图像和二维视频扩散模型的层,以确保生成的多视角视频具有多视角一致性和时间连贯性,大大降低了训练成本。我们进一步引入对齐模块,以对齐来自预训练多视角和二维视频扩散模型的层的潜在空间,解决了由于二维和多视角数据之间的领域差异而产生的重复使用层的不兼容性。为支持当前和未来研究,我们还贡献了一个带字幕的多视角视频数据集。实验结果表明,我们的方法生成了高质量的多视角视频,展现出生动的动作、时间连贯性和多视角一致性,对各种文本提示作出响应。
English
While diffusion models have shown impressive performance in 2D image/video
generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation
remains underexplored. The new challenges posed by T2MVid generation lie in the
lack of massive captioned multi-view videos and the complexity of modeling such
multi-dimensional distribution. To this end, we propose a novel diffusion-based
pipeline that generates high-quality multi-view videos centered around a
dynamic 3D object from text. Specifically, we factor the T2MVid problem into
viewpoint-space and time components. Such factorization allows us to combine
and reuse layers of advanced pre-trained multi-view image and 2D video
diffusion models to ensure multi-view consistency as well as temporal coherence
for the generated multi-view videos, largely reducing the training cost. We
further introduce alignment modules to align the latent spaces of layers from
the pre-trained multi-view and the 2D video diffusion models, addressing the
reused layers' incompatibility that arises from the domain gap between 2D and
multi-view data. In support of this and future research, we further contribute
a captioned multi-view video dataset. Experimental results demonstrate that our
method generates high-quality multi-view videos, exhibiting vivid motions,
temporal coherence, and multi-view consistency, given a variety of text
prompts.Summary
AI-Generated Summary