ChatPaper.aiChatPaper

PF-LRM:无姿态大重建模型用于联合姿态和形状预测

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

November 20, 2023
作者: Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang
cs.AI

摘要

我们提出了一种无姿态大型重建模型(PF-LRM),用于从少量未定位图像中重建3D对象,即使视觉重叠很少,同时在单个A100 GPU上以约1.3秒的速度估计相对相机姿态。PF-LRM是一种高度可扩展的方法,利用自注意力模块在3D对象令牌和2D图像令牌之间交换信息;我们为每个视图预测粗略点云,然后使用可微分的透视-多点(PnP)求解器获得相机姿态。在大约100万个对象的大量多视角定位数据上训练时,PF-LRM表现出强大的跨数据集泛化能力,并在各种未见评估数据集上在姿态预测准确性和3D重建质量方面大幅优于基线方法。我们还展示了我们模型在下游文本/图像到3D任务中具有快速前馈推断的适用性。我们的项目网站位于:https://totoro97.github.io/pf-lrm。
English
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1.3 seconds on a single A100 GPU. PF-LRM is a highly scalable method utilizing the self-attention blocks to exchange information between 3D object tokens and 2D image tokens; we predict a coarse point cloud for each view, and then use a differentiable Perspective-n-Point (PnP) solver to obtain camera poses. When trained on a huge amount of multi-view posed data of ~1M objects, PF-LRM shows strong cross-dataset generalization ability, and outperforms baseline methods by a large margin in terms of pose prediction accuracy and 3D reconstruction quality on various unseen evaluation datasets. We also demonstrate our model's applicability in downstream text/image-to-3D task with fast feed-forward inference. Our project website is at: https://totoro97.github.io/pf-lrm .
PDF204December 15, 2024