场景编织者:全能型3D场景合成系统——具备可扩展性与自我反思能力的智能体
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
September 24, 2025
作者: Yandan Yang, Baoxiong Jia, Shujie Zhang, Siyuan Huang
cs.AI
摘要
随着具身智能(Embodied AI)的兴起,室内场景合成变得愈发重要,这不仅要求三维环境在视觉上逼真,还需具备物理合理性和功能多样性。尽管近期方法在视觉保真度上取得了进展,但它们往往局限于固定的场景类别,缺乏足够的物体级细节和物理一致性,且难以与复杂的用户指令对齐。本研究提出了SceneWeaver,一个反思型代理框架,通过基于工具的迭代优化统一了多样化的场景合成范式。SceneWeaver的核心在于利用基于语言模型的规划器,从一系列可扩展的场景生成工具中进行选择,这些工具涵盖了数据驱动的生成模型到基于视觉和大语言模型(LLM)的方法,并依据物理合理性、视觉真实度及与用户输入的语义对齐度进行自我评估。这种闭环的“推理-行动-反思”设计使代理能够识别语义不一致性,调用针对性工具,并在多次迭代中更新环境。在常见及开放词汇房间类型上的大量实验表明,SceneWeaver不仅在物理、视觉和语义指标上超越了现有方法,还能有效泛化至包含多样化指令的复杂场景,标志着向通用三维环境生成迈出了重要一步。项目网站:https://scene-weaver.github.io/。
English
Indoor scene synthesis has become increasingly important with the rise of
Embodied AI, which requires 3D environments that are not only visually
realistic but also physically plausible and functionally diverse. While recent
approaches have advanced visual fidelity, they often remain constrained to
fixed scene categories, lack sufficient object-level detail and physical
consistency, and struggle to align with complex user instructions. In this
work, we present SceneWeaver, a reflective agentic framework that unifies
diverse scene synthesis paradigms through tool-based iterative refinement. At
its core, SceneWeaver employs a language model-based planner to select from a
suite of extensible scene generation tools, ranging from data-driven generative
models to visual- and LLM-based methods, guided by self-evaluation of physical
plausibility, visual realism, and semantic alignment with user input. This
closed-loop reason-act-reflect design enables the agent to identify semantic
inconsistencies, invoke targeted tools, and update the environment over
successive iterations. Extensive experiments on both common and open-vocabulary
room types demonstrate that SceneWeaver not only outperforms prior methods on
physical, visual, and semantic metrics, but also generalizes effectively to
complex scenes with diverse instructions, marking a step toward general-purpose
3D environment generation. Project website: https://scene-weaver.github.io/.