3D-LFM:提升基礎模型
3D-LFM: Lifting Foundation Model
December 19, 2023
作者: Mosam Dabhi, Laszlo A. Jeni, Simon Lucey
cs.AI
摘要
從2D地標中提取3D結構和攝影機是整個計算機視覺學科的基石。傳統方法僅限於特定剛性物體,例如透視n點(PnP)問題,但深度學習擴展了我們重建各種對象類別(例如C3PDO和PAUL)的能力,對噪音、遮擋和透視失真具有韌性。然而,所有這些技術都受到建立3D訓練數據之間對應的基本需求的限制,從而顯著限制了它們對擁有豐富“非對應”3D數據的應用的效用。我們的方法利用變換器的固有置換等變性來管理每個3D數據實例中不同數量的點,抵禦遮擋,並推廣到未見過的類別。我們展示了在2D-3D提取任務基準測試中的最新性能。由於我們的方法可以跨越如此廣泛的結構類別進行訓練,我們將其簡單地稱為3D提取基礎模型(3D-LFM)-- 這是首創的。
English
The lifting of 3D structure and camera from 2D landmarks is at the
cornerstone of the entire discipline of computer vision. Traditional methods
have been confined to specific rigid objects, such as those in
Perspective-n-Point (PnP) problems, but deep learning has expanded our
capability to reconstruct a wide range of object classes (e.g. C3PDO and PAUL)
with resilience to noise, occlusions, and perspective distortions. All these
techniques, however, have been limited by the fundamental need to establish
correspondences across the 3D training data -- significantly limiting their
utility to applications where one has an abundance of "in-correspondence" 3D
data. Our approach harnesses the inherent permutation equivariance of
transformers to manage varying number of points per 3D data instance,
withstands occlusions, and generalizes to unseen categories. We demonstrate
state of the art performance across 2D-3D lifting task benchmarks. Since our
approach can be trained across such a broad class of structures we refer to it
simply as a 3D Lifting Foundation Model (3D-LFM) -- the first of its kind.