幻境工坊:視覺引導的高品質3D場景佈局生成
Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation
October 17, 2025
作者: Xiaoming Zhu, Xu Huang, Qinghongbing Xie, Zhi Deng, Junsheng Yu, Yirui Guan, Zhongyuan Liu, Lin Zhu, Qijun Zhao, Ligang Liu, Long Zeng
cs.AI
摘要
生成藝術性且連貫的3D場景佈局在數位內容創作中至關重要。傳統基於優化的方法往往受限於繁瑣的手動規則,而深度生成模型在產出豐富多樣的內容方面面臨挑戰。此外,利用大型語言模型的方法常缺乏穩健性,無法準確捕捉複雜的空間關係。為解決這些挑戰,本文提出了一種新穎的視覺引導3D佈局生成系統。我們首先構建了一個高品質的資產庫,包含2,037個場景資產和147個3D場景佈局。隨後,我們採用圖像生成模型將提示表示擴展為圖像,並對其進行微調以與我們的資產庫對齊。接著,我們開發了一個穩健的圖像解析模組,基於視覺語義和幾何信息來重建場景的3D佈局。最後,我們利用場景圖和整體視覺語義來優化場景佈局,確保其邏輯連貫性並與圖像保持一致。廣泛的用戶測試表明,我們的演算法在佈局豐富度和品質方面顯著優於現有方法。代碼和數據集將在https://github.com/HiHiAllen/Imaginarium上公開。
English
Generating artistic and coherent 3D scene layouts is crucial in digital
content creation. Traditional optimization-based methods are often constrained
by cumbersome manual rules, while deep generative models face challenges in
producing content with richness and diversity. Furthermore, approaches that
utilize large language models frequently lack robustness and fail to accurately
capture complex spatial relationships. To address these challenges, this paper
presents a novel vision-guided 3D layout generation system. We first construct
a high-quality asset library containing 2,037 scene assets and 147 3D scene
layouts. Subsequently, we employ an image generation model to expand prompt
representations into images, fine-tuning it to align with our asset library. We
then develop a robust image parsing module to recover the 3D layout of scenes
based on visual semantics and geometric information. Finally, we optimize the
scene layout using scene graphs and overall visual semantics to ensure logical
coherence and alignment with the images. Extensive user testing demonstrates
that our algorithm significantly outperforms existing methods in terms of
layout richness and quality. The code and dataset will be available at
https://github.com/HiHiAllen/Imaginarium.