ChatPaper.aiChatPaper

LatticeWorld:一个多模态大语言模型赋能的交互式复杂世界生成框架

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

September 5, 2025
作者: Yinglin Duan, Zhengxia Zou, Tongwei Gu, Wei Jia, Zhan Zhao, Luyi Xu, Xinzhu Liu, Hao Jiang, Kang Chen, Shuang Qiu
cs.AI

摘要

近期研究日益聚焦于开发能够模拟复杂现实场景的3D世界模型。世界模型在多个领域展现出广泛应用,包括具身智能、自动驾驶、娱乐等。具备精确物理特性的更真实模拟,将有效缩小仿真与现实的差距,使我们能够便捷地获取关于现实世界的丰富信息。尽管传统的手工建模已能创建虚拟3D场景,现代方法则利用先进的机器学习算法进行3D世界生成,最新进展主要集中在能够根据用户指令生成虚拟世界的生成式方法上。本研究探索了这一方向,提出了LatticeWorld,一个简洁高效的3D世界生成框架,旨在优化3D环境的工业生产流程。LatticeWorld结合轻量级大语言模型(如LLaMA-2-7B)与工业级渲染引擎(如虚幻引擎5),以生成动态环境。该框架接受文本描述和视觉指令作为多模态输入,创建大规模3D交互世界,具备竞争性的多智能体互动、高保真物理模拟及实时渲染功能。通过全面实验评估,LatticeWorld在场景布局生成与视觉保真度上展现出卓越的准确性。此外,相较于传统手工生产方式,LatticeWorld在保持高创意质量的同时,实现了超过90倍的工业生产效率提升。我们的演示视频可在https://youtu.be/8VWZXpERR18观看。
English
Recent research has been increasingly focusing on developing 3D world models that simulate complex real-world scenarios. World models have found broad applications across various domains, including embodied AI, autonomous driving, entertainment, etc. A more realistic simulation with accurate physics will effectively narrow the sim-to-real gap and allow us to gather rich information about the real world conveniently. While traditional manual modeling has enabled the creation of virtual 3D scenes, modern approaches have leveraged advanced machine learning algorithms for 3D world generation, with most recent advances focusing on generative methods that can create virtual worlds based on user instructions. This work explores such a research direction by proposing LatticeWorld, a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments. LatticeWorld leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed framework accepts textual descriptions and visual instructions as multimodal inputs and creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction, high-fidelity physics simulation, and real-time rendering. We conduct comprehensive experiments to evaluate LatticeWorld, showing that it achieves superior accuracy in scene layout generation and visual fidelity. Moreover, LatticeWorld achieves over a 90times increase in industrial production efficiency while maintaining high creative quality compared with traditional manual production methods. Our demo video is available at https://youtu.be/8VWZXpERR18
PDF93September 8, 2025