DL3DV-10K：一個用於基於深度學習的3D視覺的大規模場景數據集

摘要

我們目睹了基於深度學習的3D視覺方面取得了顯著進展，從基於神經輻射場（NeRF）的3D表示學習到應用於新視角合成（NVS）。然而，現有用於基於深度學習的3D視覺的場景級數據集，僅限於合成環境或狹窄選擇的現實場景，相當不足。這種不足不僅阻礙了對現有方法的全面評估，還限制了在基於深度學習的3D分析中可以探索的範圍。為了填補這一關鍵差距，我們提出了DL3DV-10K，一個大規模的場景數據集，包括來自65種感興趣點（POI）位置的10,510個視頻中的5120萬幀，涵蓋了有界和無界場景，具有不同水平的反射、透明度和照明。我們在DL3DV-10K上對最近的NVS方法進行了全面評估，揭示了未來NVS研究的寶貴見解。此外，我們在一項初步研究中從DL3DV-10K中學習到了可推廣的NeRF，這顯示了建立通往學習3D表示基礎模型的大規模場景級數據集的必要性。我們的DL3DV-10K數據集、評估結果和模型將在https://dl3dv-10k.github.io/DL3DV-10K/ 上公開提供。

English

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset, benchmark results, and models will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.

DL3DV-10K：一個用於基於深度學習的3D視覺的大規模場景數據集

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

摘要

Support