GenRecon: 桥接生成先验的多视图三维场景重建

摘要

我们提出了一种从多视角RGB图像进行高保真3D场景重建的新方法，该方法将重建与强大的生成式3D先验紧密耦合。我们将场景重建视为对一组局部空间重叠区块的条件式3D生成，这些区块共同覆盖整个场景，从而将生成扩展到大规模场景范围。关键在于，我们继承了最先进生成式形状模型（以Trellis.2为例）的保真度和完整性，并将其推广到场景级别。为此，我们提出了一种基于投影的条件机制，该机制将带有位姿的多视角图像特征提升为与生成模型对齐的连贯3D表示，这种表示独立于视角顺序且空间锚定于场景，从而生成高保真、多视角一致的几何结构。这使得我们能够将Trellis.2的强目标级先验提升到多视角、场景规模的生成，从而得到室内环境的忠实、可编辑的PBR网格重建结果。最终，我们获得的高保真结果相比最先进的重建方法提升了16%。

English

We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we use Trellis.2 as an example -- which we generalize to the scene level. To this end, we propose a projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior of Trellis.2 to multi-view, scene-scale generation, producing faithful, editable PBR mesh reconstructions of indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.