3D-LFM：提升基础模型

摘要

从二维地标中提取三维结构和相机是整个计算机视觉学科的基石。传统方法局限于特定的刚性物体，比如透视n点（PnP）问题，但深度学习扩展了我们重建各种对象类别（例如C3PDO和PAUL）的能力，对噪声、遮挡和透视失真具有韧性。然而，所有这些技术都受制于建立3D训练数据之间对应关系的基本需求，极大地限制了它们在需要大量“不对应”3D数据的应用中的实用性。我们的方法利用变换器的固有置换等变性来处理每个3D数据实例中不同数量的点，抵抗遮挡，并推广到未见类别。我们展示了在2D-3D提取任务基准测试中的最先进性能。由于我们的方法可以跨越如此广泛的结构类别进行训练，我们简称它为三维提取基础模型（3D-LFM）-- 这是首创的。

English

The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP) problems, but deep learning has expanded our capability to reconstruct a wide range of object classes (e.g. C3PDO and PAUL) with resilience to noise, occlusions, and perspective distortions. All these techniques, however, have been limited by the fundamental need to establish correspondences across the 3D training data -- significantly limiting their utility to applications where one has an abundance of "in-correspondence" 3D data. Our approach harnesses the inherent permutation equivariance of transformers to manage varying number of points per 3D data instance, withstands occlusions, and generalizes to unseen categories. We demonstrate state of the art performance across 2D-3D lifting task benchmarks. Since our approach can be trained across such a broad class of structures we refer to it simply as a 3D Lifting Foundation Model (3D-LFM) -- the first of its kind.

3D-LFM：提升基础模型

3D-LFM: Lifting Foundation Model

摘要

Support