V3D：视频扩散模型是有效的3D生成器。

摘要

自动三维生成近来备受关注。最近的方法大大加快了生成速度，但通常由于模型容量有限或三维数据有限而产生了较少细节的对象。受到视频扩散模型最新进展的启发，我们引入了V3D，利用预训练视频扩散模型的世界模拟能力来促进三维生成。为了充分释放视频扩散感知三维世界的潜力，我们进一步引入了几何一致性先验，并将视频扩散模型扩展为多视角一致的三维生成器。借助这一点，最先进的视频扩散模型可以进行微调，以生成环绕对象的360度轨道帧，仅需一张单独的图像。通过我们量身定制的重建流程，我们可以在3分钟内生成高质量的网格或三维高斯模型。此外，我们的方法可以扩展到场景级别的新视角合成，实现对相机路径的精确控制，同时具有稀疏输入视图。大量实验证明了所提方法在生成质量和多视角一致性方面的卓越性能。我们的代码可在https://github.com/heheyas/V3D找到。

English

Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency. Our code is available at https://github.com/heheyas/V3D

V3D：视频扩散模型是有效的3D生成器。

V3D: Video Diffusion Models are Effective 3D Generators

摘要

Support