FoundationPose: 새로운 객체의 통합 6D 포즈 추정 및 추적

초록

본 논문에서는 6D 객체 포즈 추정 및 추적을 위한 통합 기반 모델인 FoundationPose를 제안한다. 이 모델은 모델 기반 및 모델 프리 설정을 모두 지원하며, 테스트 시점에서 새로운 객체에 즉시 적용할 수 있다. 단, 해당 객체의 CAD 모델이 제공되거나 소수의 참조 이미지가 캡처된 경우에 한한다. 우리는 신경망 기반의 암묵적 표현을 통해 두 설정 간의 간극을 메우며, 동일한 통합 프레임워크 내에서 하위 포즈 추정 모듈의 불변성을 유지한다. 대규모 합성 데이터 학습, 대형 언어 모델(LLM), 새로운 트랜스포머 기반 아키텍처, 그리고 대조 학습 방식을 통해 강력한 일반화 성능을 달성하였다. 다양한 공개 데이터셋을 활용한 평가 결과, 본 접근법은 각 작업에 특화된 기존 방법들을 큰 차이로 능가하는 것으로 나타났다. 또한, 가정을 줄였음에도 불구하고 인스턴스 수준의 방법들과 비슷한 결과를 달성하였다. 프로젝트 페이지: https://nvlabs.github.io/FoundationPose/

English

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/

FoundationPose: 새로운 객체의 통합 6D 포즈 추정 및 추적

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

초록

Support