SVG:通过去噪帧矩阵生成3D立体视频。
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
June 29, 2024
作者: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, Xiaojuan Qi, Yinda Zhang
cs.AI
摘要
视频生成模型已经展示出了生成令人印象深刻的单眼视频的巨大能力,然而,生成3D立体视频仍然是一个未被充分探索的领域。我们提出了一种无需姿势和训练的方法,利用现成的单眼视频生成模型生成3D立体视频。我们的方法通过使用估计的视频深度,将生成的单眼视频变形成立体基线上的摄像机视图,并采用了一种新颖的帧矩阵视频修补框架。该框架利用视频生成模型来修补从不同时间戳和视角观察到的帧。这种有效的方法生成一致且语义连贯的立体视频,无需场景优化或模型微调。此外,我们开发了一种消除边界重新注入方案,通过减轻潜在空间中来自未遮挡区域的负面影响,进一步提高视频修补的质量。我们通过在来自各种生成模型的视频上进行实验来验证我们提出的方法的有效性,包括Sora [4]、Lumiere [2]、WALT [8]和Zeroscope [42]。实验证明我们的方法明显优于先前的方法。代码将在https://daipengwa.github.io/SVG_ProjectPage发布。
English
Video generation models have demonstrated great capabilities of producing
impressive monocular videos, however, the generation of 3D stereoscopic video
remains under-explored. We propose a pose-free and training-free approach for
generating 3D stereoscopic videos using an off-the-shelf monocular video
generation model. Our method warps a generated monocular video into camera
views on stereoscopic baseline using estimated video depth, and employs a novel
frame matrix video inpainting framework. The framework leverages the video
generation model to inpaint frames observed from different timestamps and
views. This effective approach generates consistent and semantically coherent
stereoscopic videos without scene optimization or model fine-tuning. Moreover,
we develop a disocclusion boundary re-injection scheme that further improves
the quality of video inpainting by alleviating the negative effects propagated
from disoccluded areas in the latent space. We validate the efficacy of our
proposed method by conducting experiments on videos from various generative
models, including Sora [4 ], Lumiere [2], WALT [8 ], and Zeroscope [ 42]. The
experiments demonstrate that our method has a significant improvement over
previous methods. The code will be released at
https://daipengwa.github.io/SVG_ProjectPage.Summary
AI-Generated Summary