幾何学的考慮による表現ノイズ除去を用いた頑健な多視点3次元再構成

要旨

多視点3D再構成は、フィードフォワード型3D再構成モデルの登場により目覚ましい進展を遂げている。しかし、これらのモデルは通常、劣化のない理想的な撮影条件下で学習・評価される一方、実際の観測データには、そのような設定とは大きく異なる劣化が含まれることが多い。そのため、劣化条件下での多視点3D再構成の頑健性を向上させることは、引き続き重要な課題である。本稿では、フィードフォワード型3D再構成モデルの特徴空間において、拡散モデルに基づく多視点復元を直接実行する新たなフレームワーク「幾何学認識表現ノイズ除去（GARD）」を提案する。本設計は、3D再構成器の幾何学認識特徴表現を活用し、正確なシーン形状の復元を効果的に実現する。さらに、追加のRGB画像デコーダを用いることで、精緻化された特徴表現から高品質なRGB画像の復元も可能となり、3Dシーン形状と高品質画像の同時復元を実現する。Depth Anything 3（DA3）ベンチマークにおける包括的な実験により、提案するGARDフレームワークの有効性が実証された。

English

Multi-view 3D reconstruction has achieved remarkable progress with the advent of feed-forward 3D reconstruction models. However, these models are typically trained and evaluated under ideal, degradation-free imaging conditions, whereas real-world observations often contain degradations that differ significantly from such settings. Improving robustness for multi-view 3D reconstruction under degraded conditions therefore remains an important challenge. We present Geometry-Aware Representation Denoising (GARD), a novel framework that performs diffusion-based multi-view restoration directly in the feature space of a feed-forward 3D reconstruction model. This design exploits the geometry-aware feature representations of the 3D reconstructor to effectively recover accurate scene geometry. Furthermore, by employing an additional RGB image decoder, the refined representations can also be used to restore high-quality RGB images, thereby enabling the simultaneous recovery of 3D scene geometry and high-quality imagery. Comprehensive experiments on the Depth Anything 3 (DA3) benchmark demonstrate the effectiveness of the proposed GARD framework.