V3D：視頻擴散模型是有效的3D生成器。

摘要

自動3D生成最近引起廣泛關注。最近的方法極大加速了生成速度，但通常由於模型容量有限或3D數據不足而產生較少細節的物體。受到視頻擴散模型最新進展的啟發，我們引入了V3D，利用預先訓練的視頻擴散模型的世界模擬能力來促進3D生成。為了充分發揮視頻擴散對感知3D世界的潛力，我們進一步引入了幾何一致性先驗，並將視頻擴散模型擴展為多視角一致的3D生成器。通過這一方法，最先進的視頻擴散模型可以進行微調，以生成環繞物體的360度軌道幀，只需一張圖像。通過我們量身定制的重構流程，我們可以在3分鐘內生成高質量網格或3D高斯模型。此外，我們的方法可以擴展到場景級別的新視角合成，實現對相機路徑的精確控制，並使用稀疏輸入視圖。大量實驗證明了所提方法在生成質量和多視角一致性方面的卓越性能。我們的代碼可在https://github.com/heheyas/V3D找到。

English

Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency. Our code is available at https://github.com/heheyas/V3D

V3D：視頻擴散模型是有效的3D生成器。

V3D: Video Diffusion Models are Effective 3D Generators

摘要

Support