GenRecon: 다중 시점 3D 장면 재구성을 위한 생성적 사전 지식의 연결

초록

우리는 다중 뷰 RGB 이미지로부터 고충실도 3D 장면 재구성을 위한 새로운 접근법을 소개하며, 이는 재구성을 강력한 생성적 3D 사전 정보와 밀접하게 결합한다. 장면 재구성을 공간적으로 국소화되고 중첩된 청크 집합에 대한 조건부 3D 생성으로 설정하며, 이 청크들이 함께 장면을 타일링하여 생성을 대규모 장면 범위로 확장한다. 핵심적으로, 우리는 최신 생성적 형태 모델의 충실도와 완전성을 계승한다——예로 Trellis.2를 사용한다——이를 장면 수준으로 일반화한다. 이를 위해, 우리는 투영 기반 조건화 메커니즘을 제안하여, 포즈가 주어진 다중 뷰 이미지 특징을 생성 모델과 정렬된 일관된 3D 표현으로 끌어올리며, 뷰 순서에 독립적이고 공간적으로 장면에 고정되어, 다중 뷰 일관성을 갖춘 고충실도 생성 형상을 산출한다. 이를 통해 Trellis.2의 강력한 객체 수준 사전 정보를 다중 뷰, 장면 규모 생성으로 끌어올려, 실내 환경의 충실하고 편집 가능한 PBR 메시 재구성을 생성한다. 그 결과, 최첨단 재구성 방법보다 16% 향상된 고충실도 결과를 얻는다.

English

We introduce a new approach to high-fidelity 3D scene reconstruction from multi-view RGB images that tightly couples reconstruction with a strong generative 3D prior. We cast scene reconstruction as conditional 3D generation over a set of spatially-localized, overlapping chunks that together tile the scene, scaling generation to large scene extents. Crucially, we inherit the fidelity and completeness of state-of-the-art generative shape models -- we use Trellis.2 as an example -- which we generalize to the scene level. To this end, we propose a projection-based conditioning mechanism that lifts posed multi-view image features into a coherent 3D representation aligned with the generative model, independent of view ordering and spatially anchored to the scene, yielding high-fidelity, multi-view consistent generated geometry. This enables lifting the strong object-level prior of Trellis.2 to multi-view, scene-scale generation, producing faithful, editable PBR mesh reconstructions of indoor environments. As a result, we obtain high-fidelity results that outperform cutting-edge reconstruction methods by 16%.