DL3DV-10K:用于基于深度学习的3D视觉的大规模场景数据集
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
December 26, 2023
作者: Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianti Zhang, Bedrich Benes, Aniket Bera
cs.AI
摘要
我们目睹了基于深度学习的3D视觉取得了显著进展,从基于神经辐射场(NeRF)的3D表示学习到在新颖视角合成(NVS)中的应用。然而,现有用于基于深度学习的3D视觉的场景级数据集,仅限于合成环境或狭窄选择的现实场景,相当不足。这种不足不仅妨碍了对现有方法的全面基准测试,还限制了在基于深度学习的3D分析中可以探索的内容。为了填补这一关键差距,我们提出了DL3DV-10K,一个大规模场景数据集,包括来自65种感兴趣点(POI)位置的10,510个视频中的51.2百万帧,涵盖了有界和无界场景,具有不同水平的反射、透明度和照明。我们在DL3DV-10K上对最近的NVS方法进行了全面基准测试,揭示了未来NVS研究的宝贵见解。此外,我们在一项试点研究中从DL3DV-10K学习到了可泛化的NeRF令人鼓舞的结果,这表明了大规模场景级数据集对于打造学习3D表示的基础模型的必要性。我们的DL3DV-10K数据集、基准测试结果和模型将在https://dl3dv-10k.github.io/DL3DV-10K/ 上公开获取。
English
We have witnessed significant progress in deep learning-based 3D vision,
ranging from neural radiance field (NeRF) based 3D representation learning to
applications in novel view synthesis (NVS). However, existing scene-level
datasets for deep learning-based 3D vision, limited to either synthetic
environments or a narrow selection of real-world scenes, are quite
insufficient. This insufficiency not only hinders a comprehensive benchmark of
existing methods but also caps what could be explored in deep learning-based 3D
analysis. To address this critical gap, we present DL3DV-10K, a large-scale
scene dataset, featuring 51.2 million frames from 10,510 videos captured from
65 types of point-of-interest (POI) locations, covering both bounded and
unbounded scenes, with different levels of reflection, transparency, and
lighting. We conducted a comprehensive benchmark of recent NVS methods on
DL3DV-10K, which revealed valuable insights for future research in NVS. In
addition, we have obtained encouraging results in a pilot study to learn
generalizable NeRF from DL3DV-10K, which manifests the necessity of a
large-scale scene-level dataset to forge a path toward a foundation model for
learning 3D representation. Our DL3DV-10K dataset, benchmark results, and
models will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.