ChatPaper.aiChatPaper

HunyuanWorld 1.0:从文字或像素生成沉浸式、可探索与交互的3D世界

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

July 29, 2025
作者: HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, Junta Wu, Zixiao Gu, Haoyuan Wang, Xuhui Zuo, Tianyu Huang, Wenhuan Li, Sheng Zhang, Yihang Lian, Yulin Tsai, Lifu Wang, Sicong Liu, Puhua Jiang, Xianghui Yang, Dongyuan Guo, Yixuan Tang, Xinyue Mao, Jiaao Yu, Junlin Yu, Jihong Zhang, Meng Chen, Liang Dong, Yiwen Jia, Chao Zhang, Yonghao Tan, Hao Zhang, Zheng Ye, Peng He, Runzhou Wu, Minghui Chen, Zhan Li, Wangchen Qin, Lei Wang, Yifu Sun, Lin Niu, Xiang Yuan, Xiaofeng Yang, Yingping He, Jie Xiao, Yangyu Tao, Jianchen Zhu, Jinbao Xue, Kai Liu, Chongqing Zhao, Xinming Wu, Tian Liu, Peng Chen, Di Wang, Yuhong Liu, Linus, Jie Jiang, Tengfei Wang, Chunchao Guo
cs.AI

摘要

从文本或图像创建沉浸式且可游玩的3D世界,始终是计算机视觉与图形学领域的一项根本性挑战。现有的世界生成方法主要分为两类:基于视频的方法虽能提供丰富的多样性,却缺乏3D一致性和渲染效率;而基于3D的方法虽保证了几何一致性,却受限于训练数据的匮乏和内存效率低下的表示方式。为克服这些局限,我们推出了HunyuanWorld 1.0,一个创新框架,它融合了两者的优势,能够依据文本和图像条件生成沉浸式、可探索且互动的3D场景。我们的方法具备三大核心优势:1)通过全景世界代理实现360°沉浸体验;2)具备网格导出能力,确保与现有计算机图形管线的无缝兼容;3)解耦的对象表示,增强了交互性。该框架的核心在于一种语义分层的3D网格表示法,它利用全景图像作为360°世界代理,进行语义感知的世界分解与重建,从而生成多样化的3D世界。大量实验证明,我们的方法在生成连贯、可探索且互动的3D世界方面达到了业界领先水平,同时为虚拟现实、物理模拟、游戏开发及互动内容创作等应用场景提供了广泛的可能性。
English
Creating immersive and playable 3D worlds from texts or images remains a fundamental challenge in computer vision and graphics. Existing world generation approaches typically fall into two categories: video-based methods that offer rich diversity but lack 3D consistency and rendering efficiency, and 3D-based methods that provide geometric consistency but struggle with limited training data and memory-inefficient representations. To address these limitations, we present HunyuanWorld 1.0, a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions. Our approach features three key advantages: 1) 360{\deg} immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity. The core of our framework is a semantically layered 3D mesh representation that leverages panoramic images as 360{\deg} world proxies for semantic-aware world decomposition and reconstruction, enabling the generation of diverse 3D worlds. Extensive experiments demonstrate that our method achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds while enabling versatile applications in virtual reality, physical simulation, game development, and interactive content creation.
PDF755July 30, 2025