SVG:通過去噪幀矩陣生成3D立體視頻
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
June 29, 2024
作者: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, Xiaojuan Qi, Yinda Zhang
cs.AI
摘要
影片生成模型展現了製作令人印象深刻的單眼影片的強大能力,然而,3D立體影片的生成仍未被充分探索。我們提出了一種無需姿勢和訓練的方法,利用現成的單眼影片生成模型生成3D立體影片。我們的方法通過使用估計的影片深度,將生成的單眼影片轉換為立體基線上的攝影機視圖,並採用了一個新穎的幀矩陣影片修補框架。該框架利用影片生成模型來修補從不同時間戳和視圖觀察到的幀。這種有效的方法生成一致且語義連貫的立體影片,而無需場景優化或模型微調。此外,我們開發了一種消除不連續邊界重新注入方案,通過減輕潛在空間中從不連續區域傳播的負面影響,進一步提高了影片修補的質量。我們通過對來自各種生成模型的影片進行實驗,包括Sora [4]、Lumiere [2]、WALT [8]和Zeroscope [42],來驗證我們提出的方法的有效性。實驗表明,我們的方法明顯優於先前的方法。程式碼將在https://daipengwa.github.io/SVG_ProjectPage 上發布。
English
Video generation models have demonstrated great capabilities of producing
impressive monocular videos, however, the generation of 3D stereoscopic video
remains under-explored. We propose a pose-free and training-free approach for
generating 3D stereoscopic videos using an off-the-shelf monocular video
generation model. Our method warps a generated monocular video into camera
views on stereoscopic baseline using estimated video depth, and employs a novel
frame matrix video inpainting framework. The framework leverages the video
generation model to inpaint frames observed from different timestamps and
views. This effective approach generates consistent and semantically coherent
stereoscopic videos without scene optimization or model fine-tuning. Moreover,
we develop a disocclusion boundary re-injection scheme that further improves
the quality of video inpainting by alleviating the negative effects propagated
from disoccluded areas in the latent space. We validate the efficacy of our
proposed method by conducting experiments on videos from various generative
models, including Sora [4 ], Lumiere [2], WALT [8 ], and Zeroscope [ 42]. The
experiments demonstrate that our method has a significant improvement over
previous methods. The code will be released at
https://daipengwa.github.io/SVG_ProjectPage.Summary
AI-Generated Summary