程式碼即房間:透過代理式程式碼合成從俯視圖影像生成3D房間
Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis
May 18, 2026
作者: Yixuan Yang, Zhen Luo, Wanshui Gan, Jinkun Hao, Junru Lu, Jinghao Yan, Zhaoyang Lyu, Xudong Xu
cs.AI
摘要
設計逼真且功能完整的3D室內房間對於廣泛的應用至關重要,包括室內設計、虛擬實境、遊戲以及具身智能。雖然近期基於多模態大語言模型(MLLM)的方法在從文字描述或參考圖像合成3D房間方面展現出巨大潛力,但文字為基礎的方法難以捕捉精確的空間資訊,而現有的圖像條件智能體在從俯視圖進行整體房間生成時,則存在不穩定與無限循環的問題。為了解決這些局限,我們提出「程式碼即房間」(Code-as-Room)框架——一個基於MLLM的智能體框架,配備結構化的執行調度機制,並以Blender程式碼來表示3D房間。給定一張俯視房間圖像,該框架會解析參考圖像以提取場景元素及其空間關係,並在一個原則性、多階段的流程中,合成用於幾何、材質與燈光的可執行Blender程式碼。我們全程維護一個跨階段的記憶模組,以緩解現有智能體框架固有的上下文遺忘問題。此外,我們還引入了專為基於程式碼的3D房間合成設計的基準測試,涵蓋多種評估協議。根據此基準測試,我們與現有基於智能體的方法進行了全面比較,以驗證我們所提出的執行調度機制的有效性。
English
Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, text-based methods struggle to capture precise spatial information, and existing image-conditioned agents suffer from instability and infinite looping when tasked with holistic room generation from top-down views. To address these limitations, we propose Code-as-Room, an MLLM-based agentic framework equipped with a structured execution harness, which represents 3D rooms with Blender codes. Given a top-down room image, the framework parses the reference image to extract scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting in a principled, multi-stage pipeline. A cross-stage memory module is maintained throughout to mitigate context forgetting inherent to existing agent-based frameworks. We further introduce a dedicated benchmark for code-based 3D room synthesis, encompassing various evaluation protocols. Based on our benchmark, comprehensive comparisons against existing agent-based methods are conducted to validate the effectiveness of our proposed execution harness.