ChatPaper.aiChatPaper

Matrix-3D:全方位可探索的三維世界生成

Matrix-3D: Omnidirectional Explorable 3D World Generation

August 11, 2025
作者: Zhongqi Yang, Wenhang Ge, Yuqi Li, Jiaqi Chen, Haoyuan Li, Mengyin An, Fei Kang, Hua Xue, Baixin Xu, Yuyang Yin, Eric Li, Yang Liu, Yikai Wang, Hao-Xiang Guo, Yahui Zhou
cs.AI

摘要

從單一圖像或文本提示生成可探索的3D世界,構成了空間智能的基石。近期研究利用視頻模型實現了廣泛範圍且可泛化的3D世界生成。然而,現有方法在生成場景的範圍上往往受限。在本研究中,我們提出了Matrix-3D框架,該框架利用全景表示進行大範圍全向可探索3D世界的生成,結合了條件視頻生成與全景3D重建技術。我們首先訓練了一個以場景網格渲染為條件的軌跡引導全景視頻擴散模型,以實現高質量且幾何一致的場景視頻生成。為了將全景場景視頻提升至3D世界,我們提出了兩種獨立的方法:(1) 一種前饋式大型全景重建模型,用於快速3D場景重建;(2) 一種基於優化的流程,用於精確且細緻的3D場景重建。為了促進有效訓練,我們還引入了Matrix-Pano數據集,這是首個大規模合成集合,包含116K高質量靜態全景視頻序列,並附有深度和軌跡註釋。大量實驗證明,我們提出的框架在全景視頻生成和3D世界生成方面達到了最先進的性能。更多詳情請見https://matrix-3d.github.io。
English
Explorable 3D world generation from a single image or text prompt forms a cornerstone of spatial intelligence. Recent works utilize video model to achieve wide-scope and generalizable 3D world generation. However, existing approaches often suffer from a limited scope in the generated scenes. In this work, we propose Matrix-3D, a framework that utilize panoramic representation for wide-coverage omnidirectional explorable 3D world generation that combines conditional video generation and panoramic 3D reconstruction. We first train a trajectory-guided panoramic video diffusion model that employs scene mesh renders as condition, to enable high-quality and geometrically consistent scene video generation. To lift the panorama scene video to 3D world, we propose two separate methods: (1) a feed-forward large panorama reconstruction model for rapid 3D scene reconstruction and (2) an optimization-based pipeline for accurate and detailed 3D scene reconstruction. To facilitate effective training, we also introduce the Matrix-Pano dataset, the first large-scale synthetic collection comprising 116K high-quality static panoramic video sequences with depth and trajectory annotations. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance in panoramic video generation and 3D world generation. See more in https://matrix-3d.github.io.
PDF623August 13, 2025