3D與4D世界建模：研究綜述

摘要

世界建模已成为人工智能研究的基石，使智能体能够理解、表示并预测其所处的动态环境。尽管先前的研究主要侧重于针对二维图像和视频数据的生成方法，但它们忽视了日益增多的利用原生三维和四维表示（如RGB-D图像、占据栅格和LiDAR点云）进行大规模场景建模的工作。同时，“世界模型”缺乏标准化定义和分类体系，导致文献中的主张零散且有时不一致。本综述通过首次专门针对三维和四维世界建模与生成的全面回顾，填补了这些空白。我们确立了精确的定义，引入了一个涵盖基于视频（VideoGen）、基于占据（OccGen）和基于LiDAR（LiDARGen）方法的结构化分类体系，并系统总结了专为三维/四维场景设计的数据集和评估指标。此外，我们探讨了实际应用，识别了开放挑战，并强调了有前景的研究方向，旨在为该领域的进步提供一个连贯且基础性的参考。现有文献的系统性总结可在https://github.com/worldbench/survey获取。

English

World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit. While prior work largely emphasizes generative methods for 2D image and video data, they overlook the rapidly growing body of work that leverages native 3D and 4D representations such as RGB-D imagery, occupancy grids, and LiDAR point clouds for large-scale scene modeling. At the same time, the absence of a standardized definition and taxonomy for ``world models'' has led to fragmented and sometimes inconsistent claims in the literature. This survey addresses these gaps by presenting the first comprehensive review explicitly dedicated to 3D and 4D world modeling and generation. We establish precise definitions, introduce a structured taxonomy spanning video-based (VideoGen), occupancy-based (OccGen), and LiDAR-based (LiDARGen) approaches, and systematically summarize datasets and evaluation metrics tailored to 3D/4D settings. We further discuss practical applications, identify open challenges, and highlight promising research directions, aiming to provide a coherent and foundational reference for advancing the field. A systematic summary of existing literature is available at https://github.com/worldbench/survey

3D與4D世界建模：研究綜述

3D and 4D World Modeling: A Survey

摘要

Support