HunyuanWorld 1.0：从文字或像素生成沉浸式、可探索与交互的3D世界

摘要

从文本或图像创建沉浸式且可游玩的3D世界，始终是计算机视觉与图形学领域的一项根本性挑战。现有的世界生成方法主要分为两类：基于视频的方法虽能提供丰富的多样性，却缺乏3D一致性和渲染效率；而基于3D的方法虽保证了几何一致性，却受限于训练数据的匮乏和内存效率低下的表示方式。为克服这些局限，我们推出了HunyuanWorld 1.0，一个创新框架，它融合了两者的优势，能够依据文本和图像条件生成沉浸式、可探索且互动的3D场景。我们的方法具备三大核心优势：1）通过全景世界代理实现360°沉浸体验；2）具备网格导出能力，确保与现有计算机图形管线的无缝兼容；3）解耦的对象表示，增强了交互性。该框架的核心在于一种语义分层的3D网格表示法，它利用全景图像作为360°世界代理，进行语义感知的世界分解与重建，从而生成多样化的3D世界。大量实验证明，我们的方法在生成连贯、可探索且互动的3D世界方面达到了业界领先水平，同时为虚拟现实、物理模拟、游戏开发及互动内容创作等应用场景提供了广泛的可能性。

English

Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360{\deg} immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360{\deg} world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation.

HunyuanWorld 1.0：从文字或像素生成沉浸式、可探索与交互的3D世界

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

摘要

Support