Matrix-3D:全方位可探索的三维世界生成
Matrix-3D: Omnidirectional Explorable 3D World Generation
August 11, 2025
作者: Zhongqi Yang, Wenhang Ge, Yuqi Li, Jiaqi Chen, Haoyuan Li, Mengyin An, Fei Kang, Hua Xue, Baixin Xu, Yuyang Yin, Eric Li, Yang Liu, Yikai Wang, Hao-Xiang Guo, Yahui Zhou
cs.AI
摘要
从单张图像或文本提示生成可探索的三维世界构成了空间智能的基石。近期研究利用视频模型实现了广域且可泛化的三维世界生成。然而,现有方法在生成场景时往往存在范围受限的问题。本研究中,我们提出了Matrix-3D框架,该框架采用全景表示法,结合条件视频生成与全景三维重建技术,实现了广覆盖、全向可探索的三维世界生成。我们首先训练了一个轨迹引导的全景视频扩散模型,该模型以场景网格渲染为条件,确保生成高质量且几何一致的场景视频。为了将全景场景视频提升至三维世界,我们提出了两种独立的方法:(1) 一种前馈式大型全景重建模型,用于快速三维场景重建;(2) 一种基于优化的流程,用于精确且细致的三维场景重建。为了支持有效训练,我们还引入了Matrix-Pano数据集,这是首个大规模合成集合,包含11.6万条高质量静态全景视频序列,附带深度与轨迹标注。大量实验证明,我们提出的框架在全景视频生成和三维世界生成方面均达到了业界领先水平。更多详情请访问https://matrix-3d.github.io。
English
Explorable 3D world generation from a single image or text prompt forms a
cornerstone of spatial intelligence. Recent works utilize video model to
achieve wide-scope and generalizable 3D world generation. However, existing
approaches often suffer from a limited scope in the generated scenes. In this
work, we propose Matrix-3D, a framework that utilize panoramic representation
for wide-coverage omnidirectional explorable 3D world generation that combines
conditional video generation and panoramic 3D reconstruction. We first train a
trajectory-guided panoramic video diffusion model that employs scene mesh
renders as condition, to enable high-quality and geometrically consistent scene
video generation. To lift the panorama scene video to 3D world, we propose two
separate methods: (1) a feed-forward large panorama reconstruction model for
rapid 3D scene reconstruction and (2) an optimization-based pipeline for
accurate and detailed 3D scene reconstruction. To facilitate effective
training, we also introduce the Matrix-Pano dataset, the first large-scale
synthetic collection comprising 116K high-quality static panoramic video
sequences with depth and trajectory annotations. Extensive experiments
demonstrate that our proposed framework achieves state-of-the-art performance
in panoramic video generation and 3D world generation. See more in
https://matrix-3d.github.io.