ChatPaper.aiChatPaper

MVDiffusion++:用于单视角或稀疏视角3D物体重建的密集高分辨率多视角扩散模型

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

February 20, 2024
作者: Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan
cs.AI

摘要

本文提出了一种名为MVDiffusion++的神经架构,用于3D物体重建,可以在没有相机姿势的情况下,通过一个或少数几个图像合成物体的密集高分辨率视图。MVDiffusion++通过两个惊人简单的想法实现了出色的灵活性和可扩展性:1)一种“无姿势架构”,其中2D潜在特征之间的标准自注意力学习跨任意数量的条件和生成视图的3D一致性,而无需明确使用相机姿势信息;以及2)一种“视图丢失策略”,在训练过程中丢弃大量输出视图,从而减少训练时的内存占用,并实现测试时的密集高分辨率视图合成。我们使用Objaverse进行训练,使用Google扫描的对象进行评估,采用标准的新视图合成和3D重建指标,其中MVDiffusion++明显优于当前的艺术水准。我们还通过将MVDiffusion++与文本到图像生成模型相结合,展示了一个文本到3D的应用示例。
English
This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model.

Summary

AI-Generated Summary

PDF184December 15, 2024