GenRecon：橋接生成先驗的多視角三維場景重建

摘要

我們提出了一種新方法，用於從多視角RGB影像中進行高保真3D場景重建，該方法將重建與強大的生成式3D先驗緊密結合。我們將場景重建視為對一系列空間局部且重疊的區塊進行條件式3D生成，這些區塊共同覆蓋整個場景，從而將生成規模擴展至大型場景。關鍵在於，我們繼承了最先進生成形狀模型（以Trellis.2為例）的保真度與完整性，並將其推廣至場景層級。為此，我們提出了一種基於投影的條件機制，該機制將帶有姿態的多視角影像特徵提升為與生成模型對齊的連貫3D表示，且不受視角順序影響，並空間錨定於場景，從而產生高保真、多視角一致的生成幾何。這使得我們能將Trellis.2的強物件級先驗提升至多視角場景級生成，產生室內環境的逼真、可編輯PBR網格重建。最終，我們獲得了超越最先進重建方法16%的高保真成果。

English

We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we use Trellis.2 as an example -- which we generalize to the scene level. To this end, we propose a projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior of Trellis.2 to multi-view, scene-scale generation, producing faithful, editable PBR mesh reconstructions of indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.