面向鲁棒多视图三维重建的几何感知表示去噪

摘要

多视角3D重建随着前馈式3D重建模型的出现取得了显著进展。然而，这些模型通常在无退化的理想成像条件下训练与评估，而真实观测往往包含与此类设定差异显著的退化现象。因此，提升多视角3D重建在退化条件下的鲁棒性仍是一项重要挑战。我们提出几何感知表示去噪（GARD）框架，该框架创新性地在前馈式3D重建模型的特征空间中直接执行基于扩散的多视角恢复。这种设计利用3D重建器的几何感知特征表示，有效恢复了准确的场景几何信息。此外，通过引入额外的RGB图像解码器，精炼后的表示还可用于恢复高质量RGB图像，从而实现3D场景几何与高质量影像的同步重建。在Depth Anything 3 (DA3)基准上的综合实验验证了所提GARD框架的有效性。

English

Multi-view 3D reconstruction has achieved remarkable progress with the advent of feed-forward 3D reconstruction models. However, these models are typically trained and evaluated under ideal, degradation-free imaging conditions, whereas real-world observations often contain degradations that differ significantly from such settings. Improving robustness for multi-view 3D reconstruction under degraded conditions therefore remains an important challenge. We present Geometry-Aware Representation Denoising (GARD), a novel framework that performs diffusion-based multi-view restoration directly in the feature space of a feed-forward 3D reconstruction model. This design exploits the geometry-aware feature representations of the 3D reconstructor to effectively recover accurate scene geometry. Furthermore, by employing an additional RGB image decoder, the refined representations can also be used to restore high-quality RGB images, thereby enabling the simultaneous recovery of 3D scene geometry and high-quality imagery. Comprehensive experiments on the Depth Anything 3 (DA3) benchmark demonstrate the effectiveness of the proposed GARD framework.