DL3DV-10K：用于基于深度学习的3D视觉的大规模场景数据集

摘要

我们目睹了基于深度学习的3D视觉取得了显著进展，从基于神经辐射场（NeRF）的3D表示学习到在新颖视角合成（NVS）中的应用。然而，现有用于基于深度学习的3D视觉的场景级数据集，仅限于合成环境或狭窄选择的现实场景，相当不足。这种不足不仅妨碍了对现有方法的全面基准测试，还限制了在基于深度学习的3D分析中可以探索的内容。为了填补这一关键差距，我们提出了DL3DV-10K，一个大规模场景数据集，包括来自65种感兴趣点（POI）位置的10,510个视频中的51.2百万帧，涵盖了有界和无界场景，具有不同水平的反射、透明度和照明。我们在DL3DV-10K上对最近的NVS方法进行了全面基准测试，揭示了未来NVS研究的宝贵见解。此外，我们在一项试点研究中从DL3DV-10K学习到了可泛化的NeRF令人鼓舞的结果，这表明了大规模场景级数据集对于打造学习3D表示的基础模型的必要性。我们的DL3DV-10K数据集、基准测试结果和模型将在https://dl3dv-10k.github.io/DL3DV-10K/ 上公开获取。

English

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset, benchmark results, and models will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.

DL3DV-10K：用于基于深度学习的3D视觉的大规模场景数据集

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

摘要

Support