幾何感知表徵去噪用於穩健多視角三維重建

摘要

多視角三維重建在前饋式三維重建模型的推動下取得了顯著進展。然而，這類模型通常在無退化的理想成像條件下訓練與評估，但現實觀測往往存在與此設定差異顯著的退化現象。因此，提升多視角三維重建在退化條件下的穩健性仍是重要挑戰。我們提出幾何感知表示去噪（GARD）——一種新穎框架，能直接在基於擴散的前饋式三維重建模型特徵空間中執行多視角修復。此設計利用三維重建器的幾何感知特徵表示，有效恢復精確的場景幾何。此外，透過附加的RGB影像解碼器，優化後的表示亦可用於重建高品質RGB影像，從而同步實現三維場景幾何與高品質影像的恢復。在Depth Anything 3（DA3）基準上的全面實驗證明了所提出的GARD框架的有效性。

English

Multi-view 3D reconstruction has achieved remarkable progress with the advent of feed-forward 3D reconstruction models. However, these models are typically trained and evaluated under ideal, degradation-free imaging conditions, whereas real-world observations often contain degradations that differ significantly from such settings. Improving robustness for multi-view 3D reconstruction under degraded conditions therefore remains an important challenge. We present Geometry-Aware Representation Denoising (GARD), a novel framework that performs diffusion-based multi-view restoration directly in the feature space of a feed-forward 3D reconstruction model. This design exploits the geometry-aware feature representations of the 3D reconstructor to effectively recover accurate scene geometry. Furthermore, by employing an additional RGB image decoder, the refined representations can also be used to restore high-quality RGB images, thereby enabling the simultaneous recovery of 3D scene geometry and high-quality imagery. Comprehensive experiments on the Depth Anything 3 (DA3) benchmark demonstrate the effectiveness of the proposed GARD framework.