3D-LFM：提升基礎模型

摘要

從2D地標中提取3D結構和攝影機是整個計算機視覺學科的基石。傳統方法僅限於特定剛性物體，例如透視n點（PnP）問題，但深度學習擴展了我們重建各種對象類別（例如C3PDO和PAUL）的能力，對噪音、遮擋和透視失真具有韌性。然而，所有這些技術都受到建立3D訓練數據之間對應的基本需求的限制，從而顯著限制了它們對擁有豐富“非對應”3D數據的應用的效用。我們的方法利用變換器的固有置換等變性來管理每個3D數據實例中不同數量的點，抵禦遮擋，並推廣到未見過的類別。我們展示了在2D-3D提取任務基準測試中的最新性能。由於我們的方法可以跨越如此廣泛的結構類別進行訓練，我們將其簡單地稱為3D提取基礎模型（3D-LFM）-- 這是首創的。

English

The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP) problems, but deep learning has expanded our capability to reconstruct a wide range of object classes (e.g. C3PDO and PAUL) with resilience to noise, occlusions, and perspective distortions. All these techniques, however, have been limited by the fundamental need to establish correspondences across the 3D training data -- significantly limiting their utility to applications where one has an abundance of "in-correspondence" 3D data. Our approach harnesses the inherent permutation equivariance of transformers to manage varying number of points per 3D data instance, withstands occlusions, and generalizes to unseen categories. We demonstrate state of the art performance across 2D-3D lifting task benchmarks. Since our approach can be trained across such a broad class of structures we refer to it simply as a 3D Lifting Foundation Model (3D-LFM) -- the first of its kind.

3D-LFM：提升基礎模型

3D-LFM: Lifting Foundation Model

摘要

Support