ChatPaper.aiChatPaper

DINeMo:無需3D標註學習神經網格模型

DINeMo: Learning Neural Mesh Models with no 3D Annotations

March 26, 2025
作者: Weijie Guo, Guofeng Zhang, Wufei Ma, Alan Yuille
cs.AI

摘要

類別層級的3D/6D姿態估計是實現全面3D場景理解的關鍵步驟,這將為機器人和具身人工智慧領域帶來廣泛的應用。近期研究探索了從分析-合成角度處理多種2D和3D任務的神經網格模型。儘管這些方法在部分遮擋和領域轉移的魯棒性上有了顯著提升,但它們嚴重依賴於3D註釋進行部分對比學習,這限制了它們僅適用於少數類別,並阻礙了高效擴展。在本研究中,我們提出了DINeMo,這是一種無需3D註釋即可訓練的新型神經網格模型,它利用大型視覺基礎模型獲得的偽對應關係。我們採用了一種雙向偽對應生成方法,該方法結合局部外觀特徵和全局上下文信息來產生偽對應。在汽車數據集上的實驗結果表明,我們的DINeMo在零樣本和少樣本3D姿態估計上大幅超越先前方法,將與全監督方法的差距縮小了67.3%。此外,DINeMo在訓練過程中整合更多未標註圖像時,展現出有效且高效的擴展能力,這凸顯了其相較於依賴3D註釋的監督學習方法的優勢。我們的項目頁面可訪問:https://analysis-by-synthesis.github.io/DINeMo/。
English
Category-level 3D/6D pose estimation is a crucial step towards comprehensive 3D scene understanding, which would enable a broad range of applications in robotics and embodied AI. Recent works explored neural mesh models that approach a range of 2D and 3D tasks from an analysis-by-synthesis perspective. Despite the largely enhanced robustness to partial occlusion and domain shifts, these methods depended heavily on 3D annotations for part-contrastive learning, which confines them to a narrow set of categories and hinders efficient scaling. In this work, we present DINeMo, a novel neural mesh model that is trained with no 3D annotations by leveraging pseudo-correspondence obtained from large visual foundation models. We adopt a bidirectional pseudo-correspondence generation method, which produce pseudo correspondence utilize both local appearance features and global context information. Experimental results on car datasets demonstrate that our DINeMo outperforms previous zero- and few-shot 3D pose estimation by a wide margin, narrowing the gap with fully-supervised methods by 67.3%. Our DINeMo also scales effectively and efficiently when incorporating more unlabeled images during training, which demonstrate the advantages over supervised learning methods that rely on 3D annotations. Our project page is available at https://analysis-by-synthesis.github.io/DINeMo/.

Summary

AI-Generated Summary

PDF32March 27, 2025