SpaRP:快速从稀疏视图中重建三维物体并估计姿态
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
August 19, 2024
作者: Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su, Minghua Liu
cs.AI
摘要
近来,开放式三维生成引起了广泛关注。虽然许多单图像到三维的方法产生了视觉上令人满意的结果,但它们通常缺乏足够的可控性,往往会产生与用户期望不符的虚构区域。本文探讨了一个重要场景,即输入由一个或几个未摆姿的单个物体的二维图像组成,几乎没有重叠。我们提出了一种新颖的方法,即SpaRP,用于重建三维纹理网格并估计这些稀疏视图图像的相对摄像机姿势。SpaRP从二维扩散模型中提炼知识,并对其进行微调,以隐式推断稀疏视图之间的三维空间关系。扩散模型经过训练,共同预测摄像机姿势的替代表示以及在已知姿势下物体的多视图图像,整合了来自输入稀疏视图的所有信息。然后利用这些预测来完成三维重建和姿势估计,重建的三维模型可用于进一步优化输入视图的摄像机姿势。通过对三个数据集进行大量实验,我们证明了我们的方法不仅在三维重建质量和姿势预测准确性方面明显优于基线方法,而且表现出很强的效率。它仅需要约20秒即可为输入视图生成纹理网格和摄像机姿势。项目页面:https://chaoxu.xyz/sparp。
English
Open-world 3D generation has recently attracted considerable attention. While
many single-image-to-3D methods have yielded visually appealing outcomes, they
often lack sufficient controllability and tend to produce hallucinated regions
that may not align with users' expectations. In this paper, we explore an
important scenario in which the input consists of one or a few unposed 2D
images of a single object, with little or no overlap. We propose a novel
method, SpaRP, to reconstruct a 3D textured mesh and estimate the relative
camera poses for these sparse-view images. SpaRP distills knowledge from 2D
diffusion models and finetunes them to implicitly deduce the 3D spatial
relationships between the sparse views. The diffusion model is trained to
jointly predict surrogate representations for camera poses and multi-view
images of the object under known poses, integrating all information from the
input sparse views. These predictions are then leveraged to accomplish 3D
reconstruction and pose estimation, and the reconstructed 3D model can be used
to further refine the camera poses of input views. Through extensive
experiments on three datasets, we demonstrate that our method not only
significantly outperforms baseline methods in terms of 3D reconstruction
quality and pose prediction accuracy but also exhibits strong efficiency. It
requires only about 20 seconds to produce a textured mesh and camera poses for
the input views. Project page: https://chaoxu.xyz/sparp.Summary
AI-Generated Summary