DF3DV-1K: 방해 요소 없는 새로운 시점 합성을 위한 대규모 데이터셋 및 벤치마크

초록

방사 필드의 발전으로 사실적인 새로운 시점 합성이 가능해졌다. 여러 분야에서 대규모 실제 세계 데이터셋이 개발되어 포괄적인 벤치마킹을 지원하고 장면 특정 재구성을 넘어선 발전을 촉진하고 있다. 그러나 방해 요소 제거 방사 필드의 경우, 장면별로 깨끗한 이미지와 혼잡한 이미지를 포함한 대규모 데이터셋이 여전히 부족하여 발전이 제한되고 있다. 이러한 격차를 해소하기 위해, 우리는 DF3DV-1K를 소개한다. 이는 1,048개의 장면으로 구성된 대규모 실제 세계 데이터셋으로, 각 장면은 벤치마킹을 위한 깨끗한 이미지 세트와 혼잡한 이미지 세트를 제공한다. 전체적으로 데이터셋은 일상 촬영을 모방하기 위해 소비자용 카메라로 촬영된 89,924개의 이미지를 포함하며, 실내 및 실외 환경에서 128가지 방해 요소 유형과 161가지 장면 테마에 걸쳐 있다. 41개의 장면으로 구성된 선별된 하위 세트인 DF3DV-41은 까다로운 시나리오에서 방해 요소 제거 방사 필드 방법의 강건성을 평가하기 위해 체계적으로 설계되었다. DF3DV-1K를 사용하여, 우리는 최근의 9가지 방해 요소 제거 방사 필드 방법과 3D 가우시안 스플래팅을 벤치마킹하여 가장 강건한 방법과 가장 까다로운 시나리오를 식별한다. 벤치마킹 외에도, 우리는 확산 기반 2D 강화 모델을 미세 조정하여 방사 필드 방법을 개선하는 DF3DV-1K의 응용을 시연하며, 보류 세트(예: DF3DV-41)와 On-the-go 데이터셋에서 평균 0.96 dB PSNR 및 0.057 LPIPS의 개선을 달성한다. 우리는 DF3DV-1K가 방해 요소 제거 비전의 개발을 촉진하고 장면 특정 접근법을 넘어선 진전을 촉진하기를 기대한다. 데이터셋과 리더보드는 https://johnnylu305.github.io/df3dv1k_web/에서 확인할 수 있다.

English

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches. The dataset and leaderboard are available at https://johnnylu305.github.io/df3dv1k_web/.