ChatPaper.aiChatPaper

OneWorld:基于3D统一表征自编码器的场景生成驯服术

OneWorld: Taming Scene Generation with 3D Unified Representation Autoencoder

March 17, 2026
作者: Sensen Gao, Zhaoqing Wang, Qihang Cao, Dongdong Yu, Changhu Wang, Tongliang Liu, Mingming Gong, Jiawang Bian
cs.AI

摘要

现有基于扩散模型的3D场景生成方法主要在2D图像/视频隐空间中进行操作,这导致保持跨视角外观与几何一致性存在固有挑战。为弥补这一缺陷,我们提出OneWorld框架,该框架在连贯的3D表征空间内直接执行扩散过程。我们方法的核心是3D统一表征自动编码器(3D-URAE),它利用预训练的3D基础模型,通过将外观信息注入并提炼语义特征到统一3D隐空间,增强其以几何为中心的特性。此外,我们引入令牌级跨视角对应(CVC)一致性损失来显式加强视角间的结构对齐,并提出流形漂移强制(MDF)方法,通过混合漂移表征与原始表征来缓解训练-推理曝光偏差,从而构建稳健的3D流形。综合实验表明,与当前最先进的基于2D的方法相比,OneWorld能生成具有更优跨视角一致性的高质量3D场景。代码将在https://github.com/SensenGao/OneWorld开源。
English
Existing diffusion-based 3D scene generation methods primarily operate in 2D image/video latent spaces, which makes maintaining cross-view appearance and geometric consistency inherently challenging. To bridge this gap, we present OneWorld, a framework that performs diffusion directly within a coherent 3D representation space. Central to our approach is the 3D Unified Representation Autoencoder (3D-URAE); it leverages pretrained 3D foundation models and augments their geometry-centric nature by injecting appearance and distilling semantics into a unified 3D latent space. Furthermore, we introduce token-level Cross-View-Correspondence (CVC) consistency loss to explicitly enforce structural alignment across views, and propose Manifold-Drift Forcing (MDF) to mitigate train-inference exposure bias and shape a robust 3D manifold by mixing drifted and original representations. Comprehensive experiments demonstrate that OneWorld generates high-quality 3D scenes with superior cross-view consistency compared to state-of-the-art 2D-based methods. Our code will be available at https://github.com/SensenGao/OneWorld.
PDF11March 19, 2026