자율주행의 미래: KITScenes 멀티모달 데이터셋

초록

기존의 자율주행 데이터셋은 상당한 발전을 가능하게 했지만, 센서 정밀도, 지도 완전성 또는 지역적 다양성 측면에서 부족한 점이 있습니다. 본 논문에서는 고정밀 센서와 지도를 기반으로 구축된 유럽 데이터셋인 KITScenes Multimodal을 소개합니다. 완전히 동기화된 센서 스위트는 고해상도 글로벌 셔터 카메라, 400m 이상의 장거리 라이다, 4D 이미징 레이더, 그리고 이중화된 GNSS/INS 위치 측위 시스템을 결합합니다. 당사의 HD 지도는, 저희가 아는 한, 모든 센서 데이터셋 중 가장 완전하며, 오픈소스 소프트웨어를 사용한 자율주행 시험을 통해 검증되었습니다. 공개 데이터셋 중 최초로, 신호등과 같은 모든 주행 관련 교통 요소가 완전한 위상 연결성을 갖춘 재투영 정밀도 수준으로 3D 매핑되었습니다. 불규칙한 도로 배치와 혼합 교통 모드를 가진 도시들에서 기록된 당사의 데이터셋은 이용 가능한 지역적 다양성을 확장함으로써 기존 데이터셋을 보완합니다. 또한 구현 인공지능을 위한 공간 학습을 각각 발전시키는 네 가지 벤치마크, 즉 온라인 HD 지도 구축, 장거리 깊이 추정, 새로운 시점 합성, 그리고 종단간 주행을 제시합니다. 프로젝트 페이지: https://kitscenes.com/

English

Existing autonomous driving datasets have enabled major progress, but fall short in sensor fidelity, map completeness, or geographic diversity. We present KITScenes Multimodal, a European dataset built around high-fidelity sensors and maps. Our fully synchronized sensor suite combines high-resolution global-shutter cameras, long-range lidar beyond 400m, 4D imaging radar, and redundant GNSS/INS localization. Our HD maps are, to our knowledge, the most complete of any sensor dataset, validated through autonomous driving trials on open-source software. For the first time in a public dataset, all driving-relevant traffic elements, such as traffic lights, are mapped in 3D to a reprojection-accurate level with full topological connectivity. Recorded in cities with irregular street layouts and mixed traffic modes, our dataset complements existing datasets by broadening the available geographic diversity. We also introduce four benchmarks, each advancing spatial learning for embodied AI: online HD map construction, long-range depth estimation, novel view synthesis, and end-to-end driving. Project page: https://kitscenes.com/