ChatPaper.aiChatPaper

Function2Scene:基於功能規格的3D室內場景佈局

Function2Scene: 3D Indoor Scene Layout from Functional Specifications

May 29, 2026
作者: Ruiqi Wang, Qimin Chen, Daniel Ritchie, Angel X. Chang, Manolis Savva, Kai Wang, Hao Zhang
cs.AI

摘要

目前大多數文本驅動的3D室內場景合成方法是根據以物體為中心的提示來生成房間,關注的是該擺放什麼家具,而非空間如何被使用。然而,在真實的室內設計中,佈局的優劣取決於其是否能滿足使用者的需求,例如他們的活動及身體需求。我們提出Function2Scene框架,能根據功能規格生成3D室內佈局,這些規格是以自然語言撰寫的設計摘要,描述誰將使用該房間以及他們需要在該空間做什麼。給定這樣的規格後,我們的系統會解析住户的角色畫像與活動,從涵蓋空間、人體工學、活動及環境考量等17項標準的分類系統中,推導出一組自訂的功能設計約束,並用這些約束來引導佈局生成。不同於依賴大型語言模型直接產出最終場景,Function2Scene透過工具增強的檢查與修復迴圈,結合幾何量測、基於大型語言模型的脈絡推理及基於視覺語言模型的視覺評估,進行反覆評估與改進。在30個由專業人士撰寫的室內設計案例實驗中,Function2Scene產生的佈局比近期基於大型語言模型的場景合成基準更能滿足功能需求,我們的結果在94.3%的成對比較中獲得偏好。我們的工作將文本驅動的室內場景合成,從擺放合理物體,重新定義為設計支援人類使用的空間。
English
Most text-driven 3D indoor scene synthesis methods generate rooms from object-centric prompts, asking what furniture should be placed rather than how the space is used. Yet in real interior design, a layout is judged by how well it supports its occupants, e.g., their activities and physical needs. We introduce Function2Scene, a framework for generating 3D indoor layouts from functional specifications, i.e., natural-language design briefs describing who will use a room and what they need to do there. Given such a specification, our system parses occupant personas and activities, derives a customized set of functional design constraints from a taxonomy of 17 criteria spanning spatial, ergonomic, activity, and environmental considerations, and uses these constraints to guide layout generation. Rather than relying on an LLM to directly produce a final scene, Function2Scene performs iterative evaluation and refinement through a tool-augmented check-and-repair loop, combining geometric measurements, LLM-based contextual reasoning, and VLM-based visual assessment. Experiments on 30 professionally written interior-design cases show that Function2Scene produces layouts that better satisfy functional requirements than recent LLM-based scene synthesis baselines, with our results preferred in 94.3% of pairwise comparisons. Our work reframes text-driven indoor scene synthesis from placing plausible objects to designing spaces that support human use.