代码即房间:通过智能体代码合成从俯视图像生成3D房间
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis
May 18, 2026
作者: Yixuan Yang, Zhen Luo, Wanshui Gan, Jinkun Hao, Junru Lu, Jinghao Yan, Zhaoyang Lyu, Xudong Xu
cs.AI
摘要
设计和生成逼真且功能完整的3D室内房间对于室内设计、虚拟现实、游戏以及具身智能等广泛的应用领域至关重要。尽管近期基于多模态大语言模型(MLLM)的方法在从文本描述或参考图像合成3D房间方面展现出巨大潜力,但基于文本的方法难以捕捉精确的空间信息,而现有的图像条件代理在从俯视图生成整体房间时,往往存在不稳定性及无限循环的问题。为解决这些局限性,我们提出了Code-as-Room,这是一个配备结构化执行机制的MLLM智能代理框架,利用Blender代码表示3D房间。给定一张俯视房间图像,该框架会解析参考图像以提取场景元素及其空间关系,并通过一个原则化的多阶段管道,合成包含几何、材质和光照的可执行Blender代码。在整个过程中,我们维护了一个跨阶段记忆模块,以缓解现有基于代理的框架中固有的上下文遗忘问题。此外,我们还引入了一个专门针对基于代码的3D房间合成的基准测试,涵盖了多种评估协议。基于该基准测试,我们与现有基于代理的方法进行了全面比较,从而验证了我们所提出的执行机制的有效性。
English
Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, text-based methods struggle to capture precise spatial information, and existing image-conditioned agents suffer from instability and infinite looping when tasked with holistic room generation from top-down views. To address these limitations, we propose Code-as-Room, an MLLM-based agentic framework equipped with a structured execution harness, which represents 3D rooms with Blender codes. Given a top-down room image, the framework parses the reference image to extract scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting in a principled, multi-stage pipeline. A cross-stage memory module is maintained throughout to mitigate context forgetting inherent to existing agent-based frameworks. We further introduce a dedicated benchmark for code-based 3D room synthesis, encompassing various evaluation protocols. Based on our benchmark, comprehensive comparisons against existing agent-based methods are conducted to validate the effectiveness of our proposed execution harness.