FoundationPose：统一的新物体6D姿态估计与跟踪

摘要

我们提出了FoundationPose，这是一个统一的基础模型，用于6D物体姿态估计和跟踪，支持基于模型和无模型设置。我们的方法可以立即应用于新物体的测试阶段，无需微调，只要提供其CAD模型，或者捕获少量参考图像。我们通过神经隐式表示来弥合这两种设置之间的差距，这种表示允许有效的新视角合成，在相同统一框架下保持下游姿态估计模块的不变性。通过大规模合成训练、大语言模型（LLM）、一种新型基于Transformer的架构以及对比学习公式的帮助，我们实现了强大的泛化能力。在涉及具有挑战性场景和物体的多个公共数据集上进行了广泛评估，结果表明我们的统一方法在性能上大大优于现有专门针对每个任务的方法。此外，即使在减少了假设的情况下，它也实现了与实例级方法可比的结果。项目页面：https://nvlabs.github.io/FoundationPose/

English

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/

FoundationPose：统一的新物体6D姿态估计与跟踪

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

摘要

Support