FoundationPose：統一的新物體6D姿態估計與追蹤

摘要

我們提出了FoundationPose，一個統一的基礎模型，用於6D物體姿態估計和追蹤，支持基於模型和無模型的設置。我們的方法可以立即應用於新物體的測試時間，而無需微調，只要提供其CAD模型，或者捕獲少量參考圖像。我們通過神經隱式表示來彌合這兩種設置之間的差距，該表示允許有效的新視圖合成，在相同統一框架下使下游姿態估計模塊保持不變。通過大規模合成訓練、大型語言模型（LLM）、一種新型基於變壓器的架構和對比學習公式的幫助，實現了強大的泛化能力。在涉及具有挑戰性情境和物體的多個公共數據集上進行了廣泛評估，結果表明我們的統一方法在各方面均遠優於專門為每個任務設計的現有方法。此外，盡管減少了假設，它甚至實現了與實例級方法可比擬的結果。項目頁面：https://nvlabs.github.io/FoundationPose/

English

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/

FoundationPose：統一的新物體6D姿態估計與追蹤

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

摘要

Support