ChatPaper.aiChatPaper

Vid3D:使用2D視頻擴散合成動態3D場景

Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

June 17, 2024
作者: Rishab Parthasarathy, Zack Ankner, Aaron Gokaslan
cs.AI

摘要

近來在計算機視覺領域的一個前沿是3D視頻生成任務,該任務包括生成場景的時間變化的3D表示。為了生成動態的3D場景,當前的方法通過共同優化場景的時間和視圖之間的一致性來明確建模3D時間動態。在本文中,我們探討了當前方法明確強制在時間上實施多視圖一致性是否必要,或者模型是否僅需為每個時間步生成獨立的3D表示。因此,我們提出了一個名為Vid3D的模型,它利用2D視頻擴散來生成3D視頻,首先生成視頻時間動態的2D“種子”,然後獨立為種子視頻中的每個時間步生成3D表示。我們將Vid3D與兩種最先進的3D視頻生成方法進行評估,發現儘管Vid3D未明確建模3D時間動態,但其實現了可比的結果。我們進一步分析了Vid3D的質量如何取決於每幀生成的視圖數量。儘管我們觀察到視圖較少時存在一些降級,但性能降級仍然輕微。因此,我們的結果表明,生成高質量的動態3D場景可能不需要3D時間知識,這可能為該任務提供更簡單的生成算法。
English
A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by jointly optimizing for consistency across both time and views of the scene. In this paper, we instead investigate whether it is necessary to explicitly enforce multiview consistency over time, as current approaches do, or if it is sufficient for a model to generate 3D representations of each timestep independently. We hence propose a model, Vid3D, that leverages 2D video diffusion to generate 3D videos by first generating a 2D "seed" of the video's temporal dynamics and then independently generating a 3D representation for each timestep in the seed video. We evaluate Vid3D against two state-of-the-art 3D video generation methods and find that Vid3D is achieves comparable results despite not explicitly modeling 3D temporal dynamics. We further ablate how the quality of Vid3D depends on the number of views generated per frame. While we observe some degradation with fewer views, performance degradation remains minor. Our results thus suggest that 3D temporal knowledge may not be necessary to generate high-quality dynamic 3D scenes, potentially enabling simpler generative algorithms for this task.

Summary

AI-Generated Summary

PDF81December 6, 2024