ChatPaper.aiChatPaper

PF-LRM:無姿勢限制的大型重建模型,用於聯合姿勢和形狀預測

PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction

November 20, 2023
作者: Peng Wang, Hao Tan, Sai Bi, Yinghao Xu, Fujun Luan, Kalyan Sunkavalli, Wenping Wang, Zexiang Xu, Kai Zhang
cs.AI

摘要

我們提出了一個無姿態大型重建模型(PF-LRM),可從少量未定位圖像中重建3D物體,即使視覺重疊很少,同時在單個A100 GPU上約1.3秒內估計相對相機姿勢。PF-LRM是一種高度可擴展的方法,利用自我注意力塊在3D物體代幣和2D圖像代幣之間交換信息;我們為每個視圖預測一個粗略的點雲,然後使用可微分的透視n點(PnP)求解器來獲得相機姿勢。當在約1百萬個物體的大量多視圖定位數據上進行訓練時,PF-LRM表現出強大的跨數據集泛化能力,在各種未見評估數據集上在姿勢預測準確性和3D重建質量方面遠遠優於基線方法。我們還展示了我們模型在下游文本/圖像到3D任務中的應用性,具有快速前向推理。我們的項目網站位於:https://totoro97.github.io/pf-lrm。
English
We propose a Pose-Free Large Reconstruction Model (PF-LRM) for reconstructing a 3D object from a few unposed images even with little visual overlap, while simultaneously estimating the relative camera poses in ~1.3 seconds on a single A100 GPU. PF-LRM is a highly scalable method utilizing the self-attention blocks to exchange information between 3D object tokens and 2D image tokens; we predict a coarse point cloud for each view, and then use a differentiable Perspective-n-Point (PnP) solver to obtain camera poses. When trained on a huge amount of multi-view posed data of ~1M objects, PF-LRM shows strong cross-dataset generalization ability, and outperforms baseline methods by a large margin in terms of pose prediction accuracy and 3D reconstruction quality on various unseen evaluation datasets. We also demonstrate our model's applicability in downstream text/image-to-3D task with fast feed-forward inference. Our project website is at: https://totoro97.github.io/pf-lrm .
PDF204December 15, 2024