LatticeWorld: 인터랙티브 복잡 세계 생성을 위한 멀티모달 대규모 언어 모델 기반 프레임워크

초록

최근 연구는 복잡한 현실 세계 시나리오를 시뮬레이션하는 3D 세계 모델 개발에 점점 더 집중하고 있습니다. 세계 모델은 구체화된 AI, 자율 주행, 엔터테인먼트 등 다양한 분야에서 폭넓게 응용되고 있습니다. 정확한 물리학을 기반으로 한 보다 현실적인 시뮬레이션은 시뮬레이션과 현실 간의 격차를 효과적으로 줄이고, 현실 세계에 대한 풍부한 정보를 편리하게 수집할 수 있게 해줍니다. 전통적인 수동 모델링은 가상 3D 장면을 생성할 수 있게 했지만, 현대적인 접근 방식은 고급 머신러닝 알고리즘을 활용하여 3D 세계를 생성하며, 최근의 발전은 사용자 지시에 따라 가상 세계를 생성할 수 있는 생성적 방법에 초점을 맞추고 있습니다. 본 연구는 이러한 연구 방향을 탐구하며, 3D 환경의 산업 생산 파이프라인을 간소화하는 간단하지만 효과적인 3D 세계 생성 프레임워크인 LatticeWorld를 제안합니다. LatticeWorld는 경량 LLM(LLaMA-2-7B)과 산업용 렌더링 엔진(예: Unreal Engine 5)을 활용하여 동적 환경을 생성합니다. 제안된 프레임워크는 텍스트 설명과 시각적 지시를 다중 모드 입력으로 받아들이고, 경쟁적인 다중 에이전트 상호작용, 고품질 물리 시뮬레이션, 실시간 렌더링을 특징으로 하는 대규모 3D 인터랙티브 세계를 생성합니다. LatticeWorld를 평가하기 위해 포괄적인 실험을 수행하여, 장면 레이아웃 생성과 시각적 충실도에서 우수한 정확도를 달성함을 보여줍니다. 또한, LatticeWorld는 전통적인 수동 생산 방법과 비교하여 높은 창의적 품질을 유지하면서 산업 생산 효율성을 90배 이상 증가시킵니다. 데모 비디오는 https://youtu.be/8VWZXpERR18에서 확인할 수 있습니다.

English

Recent research has been increasingly focusing on developing 3D world models that simulate complex real-world scenarios. World models have found broad applications across various domains, including embodied AI, autonomous driving, entertainment, etc. A more realistic simulation with accurate physics will effectively narrow the sim-to-real gap and allow us to gather rich information about the real world conveniently. While traditional manual modeling has enabled the creation of virtual 3D scenes, modern approaches have leveraged advanced machine learning algorithms for 3D world generation, with most recent advances focusing on generative methods that can create virtual worlds based on user instructions. This work explores such a research direction by proposing LatticeWorld, a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments. LatticeWorld leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed framework accepts textual descriptions and visual instructions as multimodal inputs and creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction, high-fidelity physics simulation, and real-time rendering. We conduct comprehensive experiments to evaluate LatticeWorld, showing that it achieves superior accuracy in scene layout generation and visual fidelity. Moreover, LatticeWorld achieves over a 90times increase in industrial production efficiency while maintaining high creative quality compared with traditional manual production methods. Our demo video is available at https://youtu.be/8VWZXpERR18

LatticeWorld: 인터랙티브 복잡 세계 생성을 위한 멀티모달 대규모 언어 모델 기반 프레임워크

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation

초록

Support