DL3DV-10K: 深層学習ベースの3D視覚のための大規模シーンデータセット

要旨

深層学習に基づく3Dビジョンにおいて、ニューラルラジアンスフィールド（NeRF）を基盤とした3D表現学習から新規視点合成（NVS）への応用まで、大きな進展を目撃してきました。しかし、既存の深層学習ベースの3Dビジョン向けシーンレベルデータセットは、合成環境に限定されているか、限られた実世界シーンのみを対象としており、非常に不十分です。この不十分さは、既存手法の包括的なベンチマークを妨げるだけでなく、深層学習ベースの3D分析で探求可能な範囲を制限しています。この重要なギャップを埋めるため、我々はDL3DV-10Kを提案します。これは大規模なシーンデータセットで、65種類の関心地点（POI）から撮影された10,510本の動画から得られた5,120万フレームを特徴とし、境界のあるシーンとないシーン、異なる反射率、透明度、照明条件をカバーしています。DL3DV-10Kを用いて最近のNVS手法の包括的なベンチマークを実施し、今後のNVS研究に向けた貴重な知見を得ました。さらに、DL3DV-10Kから一般化可能なNeRFを学習するパイロットスタディで有望な結果を得ており、3D表現学習の基盤モデルに向けた道を切り開くためには大規模なシーンレベルデータセットが必要であることを示しています。我々のDL3DV-10Kデータセット、ベンチマーク結果、およびモデルはhttps://dl3dv-10k.github.io/DL3DV-10K/で公開されます。

English

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset, benchmark results, and models will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.

DL3DV-10K: 深層学習ベースの3D視覚のための大規模シーンデータセット

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

要旨

Support