DF3DV-1K: 妨害要素のない新規視点合成のための大規模データセットとベンチマーク

要旨

放射輝度場の進歩により、フォトリアリスティックな新規視点合成が可能になった。いくつかの分野では、包括的なベンチマーキングを支援し、シーン固有の再構成を超えた進展を促進するために、大規模な実世界データセットが開発されてきた。しかし、妨害要素除去放射輝度場に関しては、シーンごとにクリーンな画像と乱雑な画像の両方を備えた大規模データセットが依然として不足しており、開発が制限されている。このギャップを解消するために、我々はDF3DV-1Kを導入する。これは1,048シーンからなる大規模実世界データセットであり、各シーンはベンチマーキング用にクリーンな画像セットと乱雑な画像セットを提供する。データセット全体には、コンシューマーカメラで撮影され、カジュアルなキャプチャを模した89,924枚の画像が含まれており、128種類の妨害要素タイプと161のシーンテーマが屋内および屋外環境にわたって網羅されている。また、41シーンからなる厳選されたサブセットDF3DV-41は、困難なシナリオ下での妨害要素除去放射輝度場手法のロバスト性を評価するために体系的に設計されている。DF3DV-1Kを用いて、9つの最近の妨害要素除去放射輝度場手法と3Dガウススプラッティングをベンチマークし、最もロバストな手法と最も困難なシナリオを特定した。ベンチマークに加えて、DF3DV-1Kの応用例として、拡散ベースの2Dエンハンサーを微調整して放射輝度場手法を改善し、ホールドアウトセット（例：DF3DV-41）およびOn-the-goデータセットにおいて平均でPSNRが0.96 dB、LPIPSが0.057向上したことを実証する。DF3DV-1Kが妨害要素除去ビジョンの発展を促進し、シーン固有のアプローチを超えた進歩に貢献することを期待する。データセットとリーダーボードはhttps://johnnylu305.github.io/df3dv1k_web/で入手可能である。

English

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches. The dataset and leaderboard are available at https://johnnylu305.github.io/df3dv1k_web/.