ChatPaper.aiChatPaper

Vivid-ZOO:使用擴散模型進行多視角視頻生成

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

June 12, 2024
作者: Bing Li, Cheng Zheng, Wenxuan Zhu, Jinjie Mai, Biao Zhang, Peter Wonka, Bernard Ghanem
cs.AI

摘要

儘管擴散模型在2D影像/影片生成方面表現出色,基於擴散的文本到多視角影片(T2MVid)生成仍未受到充分探索。T2MVid生成所帶來的新挑戰在於缺乏大量標註的多視角影片以及對建模此多維分佈的複雜性。為此,我們提出了一種新穎的基於擴散的流程,可以生成以動態3D物體為中心的高質量多視角影片,並從文本中生成。具體而言,我們將T2MVid問題分解為視角空間和時間組件。這種分解使我們能夠結合和重複使用先進的預訓練多視角影像和2D影片擴散模型的層,以確保生成的多視角影片具有多視角一致性和時間一致性,從而大大降低了訓練成本。我們進一步引入了對齊模組,以對齊來自預訓練多視角和2D影片擴散模型的層的潛在空間,解決了由於2D和多視角數據之間的領域差距而產生的重複使用層的不相容性。為了支持當前和未來的研究,我們進一步貢獻了一個帶標註的多視角影片數據集。實驗結果表明,我們的方法可以生成高質量的多視角影片,展現出生動的運動、時間一致性和多視角一致性,並給出各種文本提示。
English
While diffusion models have shown impressive performance in 2D image/video generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored. The new challenges posed by T2MVid generation lie in the lack of massive captioned multi-view videos and the complexity of modeling such multi-dimensional distribution. To this end, we propose a novel diffusion-based pipeline that generates high-quality multi-view videos centered around a dynamic 3D object from text. Specifically, we factor the T2MVid problem into viewpoint-space and time components. Such factorization allows us to combine and reuse layers of advanced pre-trained multi-view image and 2D video diffusion models to ensure multi-view consistency as well as temporal coherence for the generated multi-view videos, largely reducing the training cost. We further introduce alignment modules to align the latent spaces of layers from the pre-trained multi-view and the 2D video diffusion models, addressing the reused layers' incompatibility that arises from the domain gap between 2D and multi-view data. In support of this and future research, we further contribute a captioned multi-view video dataset. Experimental results demonstrate that our method generates high-quality multi-view videos, exhibiting vivid motions, temporal coherence, and multi-view consistency, given a variety of text prompts.

Summary

AI-Generated Summary

PDF83December 6, 2024