LiDAR에서 모든 것을 완성하기 위한 학습 방향

초록

우리는 라이다 기반의 야외 환경에서의 형태 완성(shape-completion)을 위한 CAL(Complete Anything in Lidar)을 제안한다. 이는 라이다 기반의 의미론적/범위적 장면 완성(semantic/panoptic scene completion)과 밀접한 관련이 있다. 그러나 기존의 방법들은 기존 라이다 데이터셋에 레이블링된 폐쇄된 어휘 목록 내의 객체만을 완성하고 인식할 수 있다. 이와 달리, 우리의 제로샷(zero-shot) 접근 방식은 다중 모달 센서 시퀀스로부터 시간적 맥락을 활용하여 관찰된 객체의 형태와 의미론적 특징을 추출한다. 이러한 특징들은 이후 라이다만을 사용하는 인스턴스 수준의 완성 및 인식 모델로 정제된다. 비록 우리가 부분적인 형태 완성만을 추출하지만, 정제된 모델은 데이터셋 전반에 걸쳐 여러 부분 관찰로부터 전체 객체 형태를 추론하는 방법을 학습한다. 우리는 이 모델이 의미론적 및 범위적 장면 완성을 위한 표준 벤치마크에서 프롬프트될 수 있으며, 객체를 (비모달) 3D 경계 상자로 위치 지정하고 고정된 클래스 어휘를 넘어 객체를 인식할 수 있음을 보여준다. 우리의 프로젝트 페이지는 https://research.nvidia.com/labs/dvl/projects/complete-anything-lidar 에서 확인할 수 있다.

English

We propose CAL (Complete Anything in Lidar) for Lidar-based shape-completion in-the-wild. This is closely related to Lidar-based semantic/panoptic scene completion. However, contemporary methods can only complete and recognize objects from a closed vocabulary labeled in existing Lidar datasets. Different to that, our zero-shot approach leverages the temporal context from multi-modal sensor sequences to mine object shapes and semantic features of observed objects. These are then distilled into a Lidar-only instance-level completion and recognition model. Although we only mine partial shape completions, we find that our distilled model learns to infer full object shapes from multiple such partial observations across the dataset. We show that our model can be prompted on standard benchmarks for Semantic and Panoptic Scene Completion, localize objects as (amodal) 3D bounding boxes, and recognize objects beyond fixed class vocabularies. Our project page is https://research.nvidia.com/labs/dvl/projects/complete-anything-lidar

LiDAR에서 모든 것을 완성하기 위한 학습 방향

Towards Learning to Complete Anything in Lidar

초록

Support