G-CUT3R：基於相機與深度先驗整合的引導式三維重建

摘要

我們介紹了G-CUT3R，這是一種新穎的前饋式引導三維場景重建方法，通過整合先驗信息來增強CUT3R模型。與現有僅依賴輸入圖像的前饋方法不同，我們的方法利用了現實場景中常見的輔助數據，如深度、相機校準或相機位置。我們對CUT3R進行了輕量級改進，為每種數據模式引入專用編碼器以提取特徵，並通過零卷積將這些特徵與RGB圖像標記融合。這種靈活的設計使得在推理過程中能夠無縫整合任意組合的先驗信息。在多個基準測試（包括三維重建及其他多視圖任務）上的評估表明，我們的方法展現了顯著的性能提升，證明了其有效利用可用先驗信息的能力，同時保持了與不同輸入模式的兼容性。

English

We introduce G-CUT3R, a novel feed-forward approach for guided 3D scene reconstruction that enhances the CUT3R model by integrating prior information. Unlike existing feed-forward methods that rely solely on input images, our method leverages auxiliary data, such as depth, camera calibrations, or camera positions, commonly available in real-world scenarios. We propose a lightweight modification to CUT3R, incorporating a dedicated encoder for each modality to extract features, which are fused with RGB image tokens via zero convolution. This flexible design enables seamless integration of any combination of prior information during inference. Evaluated across multiple benchmarks, including 3D reconstruction and other multi-view tasks, our approach demonstrates significant performance improvements, showing its ability to effectively utilize available priors while maintaining compatibility with varying input modalities.

G-CUT3R：基於相機與深度先驗整合的引導式三維重建

G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration

摘要

Support