ChatPaper.aiChatPaper

想象工坊:视觉引导下的高质量三维场景布局生成

Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation

October 17, 2025
作者: Xiaoming Zhu, Xu Huang, Qinghongbing Xie, Zhi Deng, Junsheng Yu, Yirui Guan, Zhongyuan Liu, Lin Zhu, Qijun Zhao, Ligang Liu, Long Zeng
cs.AI

摘要

在数字内容创作中,生成艺术性强且连贯的三维场景布局至关重要。传统的基于优化的方法常受限于繁琐的手动规则,而深度生成模型则在生成内容丰富且多样化的内容方面面临挑战。此外,利用大型语言模型的方法往往缺乏鲁棒性,难以准确捕捉复杂的空间关系。为解决这些问题,本文提出了一种新颖的视觉引导三维布局生成系统。我们首先构建了一个包含2037个场景资源和147个三维场景布局的高质量资源库。随后,我们采用图像生成模型将提示表示扩展为图像,并对其进行微调以与我们的资源库保持一致。接着,我们开发了一个强大的图像解析模块,基于视觉语义和几何信息恢复场景的三维布局。最后,我们利用场景图和整体视觉语义优化场景布局,确保逻辑连贯性并与图像保持一致。广泛的用户测试表明,我们的算法在布局丰富性和质量方面显著优于现有方法。代码和数据集将在https://github.com/HiHiAllen/Imaginarium上公开。
English
Generating artistic and coherent 3D scene layouts is crucial in digital content creation. Traditional optimization-based methods are often constrained by cumbersome manual rules, while deep generative models face challenges in producing content with richness and diversity. Furthermore, approaches that utilize large language models frequently lack robustness and fail to accurately capture complex spatial relationships. To address these challenges, this paper presents a novel vision-guided 3D layout generation system. We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts. Subsequently, we employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library. We then develop a robust image parsing module to recover the 3D layout of scenes based on visual semantics and geometric information. Finally, we optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images. Extensive user testing demonstrates that our algorithm significantly outperforms existing methods in terms of layout richness and quality. The code and dataset will be available at https://github.com/HiHiAllen/Imaginarium.
PDF93October 20, 2025