I-Scene：三维实例模型是隐式可泛化的空间学习器

摘要

泛化能力仍是交互式三维场景生成的核心挑战。现有基于学习的方法将空间理解建立在有限场景数据集上，限制了新布局的泛化能力。我们转而重新编程预训练的三维实例生成器，使其成为场景级学习器，用模型中心的空间监督替代数据集受限的监督。这种重新编程释放了生成器的可迁移空间知识，实现了对未见布局和新颖物体组合的泛化。值得注意的是，即使训练场景由随机组合的物体构成，空间推理能力依然能够涌现。这表明生成器的可迁移场景先验为从纯几何线索推断邻近性、支撑关系和对称性提供了丰富的学习信号。我们摒弃广泛使用的规范空间，通过视角中心的场景空间建模来实例化这一洞见，构建出完全前馈、可泛化的场景生成器，直接从实例模型中学习空间关系。定量与定性结果表明，三维实例生成器是隐式的空间学习与推理器，为交互式三维场景理解与生成的基础模型指明了方向。项目页面：https://luling06.github.io/I-Scene-project/

English

Generalization remains the central challenge for interactive 3D scene generation. Existing learning-based approaches ground spatial understanding in limited scene dataset, restricting generalization to new layouts. We instead reprogram a pre-trained 3D instance generator to act as a scene level learner, replacing dataset-bounded supervision with model-centric spatial supervision. This reprogramming unlocks the generator transferable spatial knowledge, enabling generalization to unseen layouts and novel object compositions. Remarkably, spatial reasoning still emerges even when the training scenes are randomly composed objects. This demonstrates that the generator's transferable scene prior provides a rich learning signal for inferring proximity, support, and symmetry from purely geometric cues. Replacing widely used canonical space, we instantiate this insight with a view-centric formulation of the scene space, yielding a fully feed-forward, generalizable scene generator that learns spatial relations directly from the instance model. Quantitative and qualitative results show that a 3D instance generator is an implicit spatial learner and reasoner, pointing toward foundation models for interactive 3D scene understanding and generation. Project page: https://luling06.github.io/I-Scene-project/