MVDiffusion++：用于单视角或稀疏视角3D物体重建的密集高分辨率多视角扩散模型

摘要

本文提出了一种名为MVDiffusion++的神经架构，用于3D物体重建，可以在没有相机姿势的情况下，通过一个或少数几个图像合成物体的密集高分辨率视图。MVDiffusion++通过两个惊人简单的想法实现了出色的灵活性和可扩展性：1）一种“无姿势架构”，其中2D潜在特征之间的标准自注意力学习跨任意数量的条件和生成视图的3D一致性，而无需明确使用相机姿势信息；以及2）一种“视图丢失策略”，在训练过程中丢弃大量输出视图，从而减少训练时的内存占用，并实现测试时的密集高分辨率视图合成。我们使用Objaverse进行训练，使用Google扫描的对象进行评估，采用标准的新视图合成和3D重建指标，其中MVDiffusion++明显优于当前的艺术水准。我们还通过将MVDiffusion++与文本到图像生成模型相结合，展示了一个文本到3D的应用示例。

English

This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model.

MVDiffusion++：用于单视角或稀疏视角3D物体重建的密集高分辨率多视角扩散模型

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

摘要

Support