ChatPaper.aiChatPaper

场景合成:面向三维场景生成的语言与视觉智能体框架

Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

May 5, 2025
作者: Lu Ling, Chen-Hsuan Lin, Tsung-Yi Lin, Yifan Ding, Yu Zeng, Yichen Sheng, Yunhao Ge, Ming-Yu Liu, Aniket Bera, Zhaoshuo Li
cs.AI

摘要

从文本合成交互式3D场景对于游戏、虚拟现实和具身AI至关重要。然而,现有方法面临诸多挑战。基于学习的方法依赖于小规模室内数据集,限制了场景多样性和布局复杂性。尽管大型语言模型(LLMs)能够利用多样化的文本领域知识,但在空间真实性方面表现欠佳,常常产生不符合常识的物体摆放,显得不自然。我们的核心洞察是,视觉感知能够弥补这一不足,提供LLMs所缺乏的真实空间指导。为此,我们引入了Scenethesis,一个无需训练的代理框架,它结合了基于LLM的场景规划与视觉引导的布局优化。给定文本提示,Scenethesis首先利用LLM生成粗略布局。随后,视觉模块通过生成图像引导并提取场景结构来捕捉物体间关系,进一步细化布局。接着,优化模块迭代执行精确的姿态对齐和物理合理性检查,防止物体穿透和不稳定等异常现象。最后,评判模块验证空间一致性。全面实验表明,Scenethesis能够生成多样、真实且物理合理的3D交互场景,对虚拟内容创作、模拟环境构建及具身AI研究具有重要价值。
English
Synthesizing interactive 3D scenes from text is essential for gaming, virtual reality, and embodied AI. However, existing methods face several challenges. Learning-based approaches depend on small-scale indoor datasets, limiting the scene diversity and layout complexity. While large language models (LLMs) can leverage diverse text-domain knowledge, they struggle with spatial realism, often producing unnatural object placements that fail to respect common sense. Our key insight is that vision perception can bridge this gap by providing realistic spatial guidance that LLMs lack. To this end, we introduce Scenethesis, a training-free agentic framework that integrates LLM-based scene planning with vision-guided layout refinement. Given a text prompt, Scenethesis first employs an LLM to draft a coarse layout. A vision module then refines it by generating an image guidance and extracting scene structure to capture inter-object relations. Next, an optimization module iteratively enforces accurate pose alignment and physical plausibility, preventing artifacts like object penetration and instability. Finally, a judge module verifies spatial coherence. Comprehensive experiments show that Scenethesis generates diverse, realistic, and physically plausible 3D interactive scenes, making it valuable for virtual content creation, simulation environments, and embodied AI research.

Summary

AI-Generated Summary

PDF61May 8, 2025