3D 공간에서 광도 필드의 개방 어휘 분할 재고하기

초록

장면의 3D 의미론을 이해하는 것은 신체화된 에이전트와 같은 다양한 시나리오에 대한 기본적인 문제입니다. NeRF와 3DGS는 새로운 뷰 합성에서 뛰어나지만, 이전의 의미론을 이해하는 방법은 불완전한 3D 이해로 제한되어 왔습니다: 그들의 분할 결과는 2D 마스크이며, 그들의 지도는 2D 픽셀에 고정되어 있습니다. 본 논문은 NeRF와 3DGS로 모델링된 장면의 더 나은 3D 이해를 추구하기 위해 문제를 다시 살펴봅니다. 1) 우리는 언어 임베딩 필드를 훈련시키기 위해 3D 포인트를 직접 지도합니다. 다중 스케일 언어 임베딩에 의존하지 않고 최첨단 정확도를 달성합니다. 2) 사전 훈련된 언어 필드를 3DGS로 이전하여, 훈련 시간이나 정확도를 희생하지 않고 최초의 실시간 렌더링 속도를 달성합니다. 3) 재구성된 기하학과 의미론을 함께 평가하기 위한 3D 쿼리 및 평가 프로토콜을 도입합니다. 코드, 체크포인트 및 주석은 온라인에서 제공될 예정입니다. 프로젝트 페이지: https://hyunji12.github.io/Open3DRF

English

Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. Project page: https://hyunji12.github.io/Open3DRF

3D 공간에서 광도 필드의 개방 어휘 분할 재고하기

Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

초록

Support