LatticeWorld:一個由多模態大型語言模型驅動的互動式複雜世界生成框架
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation
September 5, 2025
作者: Yinglin Duan, Zhengxia Zou, Tongwei Gu, Wei Jia, Zhan Zhao, Luyi Xu, Xinzhu Liu, Hao Jiang, Kang Chen, Shuang Qiu
cs.AI
摘要
近期研究日益聚焦于开发能够模拟复杂现实场景的三维世界模型。此类模型在多个领域展现出广泛的应用潜力,涵盖具身人工智能、自动驾驶、娱乐产业等。通过精确的物理模拟实现更为逼真的场景再现,将有效缩小仿真与现实的差距,使我们能够便捷地获取关于现实世界的丰富信息。尽管传统的手工建模方法已能构建虚拟三维场景,现代技术则借助先进的机器学习算法进行三维世界生成,其中最新进展集中于能够根据用户指令创造虚拟世界的生成式方法。本研究探索了这一研究方向,提出了LatticeWorld——一个简洁而高效的三维世界生成框架,旨在优化三维环境的工业化生产流程。LatticeWorld结合轻量级大语言模型(如LLaMA-2-7B)与行业级渲染引擎(例如Unreal Engine 5),以生成动态环境。该框架接受文本描述与视觉指示作为多模态输入,创建包含动态智能体的大规模三维交互世界,具备竞争性的多智能体互动、高保真物理模拟及实时渲染特性。我们通过一系列综合实验评估LatticeWorld,结果表明其在场景布局生成与视觉保真度方面均表现出色。此外,相较于传统手工生产方式,LatticeWorld在保持高创意质量的同时,实现了生产效率超过90倍的提升。我们的演示视频可通过https://youtu.be/8VWZXpERR18观看。
English
Recent research has been increasingly focusing on developing 3D world models
that simulate complex real-world scenarios. World models have found broad
applications across various domains, including embodied AI, autonomous driving,
entertainment, etc. A more realistic simulation with accurate physics will
effectively narrow the sim-to-real gap and allow us to gather rich information
about the real world conveniently. While traditional manual modeling has
enabled the creation of virtual 3D scenes, modern approaches have leveraged
advanced machine learning algorithms for 3D world generation, with most recent
advances focusing on generative methods that can create virtual worlds based on
user instructions. This work explores such a research direction by proposing
LatticeWorld, a simple yet effective 3D world generation framework that
streamlines the industrial production pipeline of 3D environments. LatticeWorld
leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering
engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed
framework accepts textual descriptions and visual instructions as multimodal
inputs and creates large-scale 3D interactive worlds with dynamic agents,
featuring competitive multi-agent interaction, high-fidelity physics
simulation, and real-time rendering. We conduct comprehensive experiments to
evaluate LatticeWorld, showing that it achieves superior accuracy in scene
layout generation and visual fidelity. Moreover, LatticeWorld achieves over a
90times increase in industrial production efficiency while maintaining high
creative quality compared with traditional manual production methods. Our demo
video is available at https://youtu.be/8VWZXpERR18