混元世界1.0：从文字或像素生成沉浸式、可探索且交互的三维世界

摘要

從文本或圖像創建沉浸式且可遊玩的3D世界，仍然是計算機視覺與圖形學領域的一個根本性挑戰。現有的世界生成方法通常分為兩類：基於視頻的方法雖提供豐富多樣性，卻缺乏3D一致性和渲染效率；而基於3D的方法雖保證了幾何一致性，卻受限於訓練數據的不足和內存效率低下的表示方式。為解決這些限制，我們提出了HunyuanWorld 1.0，這是一個新穎的框架，它結合了兩者的優勢，能夠從文本和圖像條件生成沉浸式、可探索且互動的3D場景。我們的方法具備三大關鍵優勢：1）通過全景世界代理實現360°沉浸式體驗；2）網格導出功能，確保與現有計算機圖形管線的無縫兼容；3）解耦的物體表示，增強了互動性。我們框架的核心是一個語義分層的3D網格表示，它利用全景圖像作為360°世界代理，進行語義感知的世界分解與重建，從而能夠生成多樣化的3D世界。大量實驗證明，我們的方法在生成連貫、可探索且互動的3D世界方面達到了最先進的水平，同時在虛擬現實、物理模擬、遊戲開發及互動內容創作等領域展現了廣泛的應用潛力。

English

Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360{\deg} immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360{\deg} world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation.

混元世界1.0：从文字或像素生成沉浸式、可探索且交互的三维世界

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

摘要

Support