ChatPaper.aiChatPaper

Gen3R:三维场景生成与前馈式重建的融合

Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction

January 7, 2026
作者: Jiaxin Huang, Yuanbo Yang, Bangbang Yang, Lin Ma, Yuewen Ma, Yiyi Liao
cs.AI

摘要

我们提出Gen3R方法,通过衔接基础重建模型与视频扩散模型的强先验,实现场景级三维生成。我们改造VGGT重建模型,通过在其标记上训练适配器来生成几何潜在表示,并对其进行正则化以对齐预训练视频扩散模型的外观潜在表示。通过联合生成这些解耦但对齐的潜在表示,Gen3R能同步生成RGB视频及对应的三维几何数据(包括相机位姿、深度图和全局点云)。实验表明,我们的方法在单图与多图条件的三维场景生成任务中达到了最先进水平。此外,本方法能通过利用生成先验提升重建鲁棒性,证明了重建模型与生成模型的紧密耦合具有相互增益的优势。
English
We present Gen3R, a method that bridges the strong priors of foundational reconstruction models and video diffusion models for scene-level 3D generation. We repurpose the VGGT reconstruction model to produce geometric latents by training an adapter on its tokens, which are regularized to align with the appearance latents of pre-trained video diffusion models. By jointly generating these disentangled yet aligned latents, Gen3R produces both RGB videos and corresponding 3D geometry, including camera poses, depth maps, and global point clouds. Experiments demonstrate that our approach achieves state-of-the-art results in single- and multi-image conditioned 3D scene generation. Additionally, our method can enhance the robustness of reconstruction by leveraging generative priors, demonstrating the mutual benefit of tightly coupling reconstruction and generative models.
PDF01January 9, 2026