ChatPaper.aiChatPaper

HY-World 2.0:用於重建、生成與模擬3D世界的多模態世界模型

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

April 15, 2026
作者: Team HY-World, Chenjie Cao, Xuhui Zuo, Zhenwei Wang, Yisu Zhang, Junta Wu, Zhenyang Liu, Yuning Gong, Yang Liu, Bo Yuan, Chao Zhang, Coopers Li, Dongyuan Guo, Fan Yang, Haiyu Zhang, Hang Cao, Jianchen Zhu, Jiaxin Lin, Jie Xiao, Jihong Zhang, Junlin Yu, Lei Wang, Lifu Wang, Lilin Wang, Linus, Minghui Chen, Peng He, Penghao Zhao, Qi Chen, Rui Chen, Rui Shao, Sicong Liu, Wangchen Qin, Xiaochuan Niu, Xiang Yuan, Yi Sun, Yifei Tang, Yifu Sun, Yihang Lian, Yonghao Tan, Yuhong Liu, Yuyang Yin, Zhiyuan Min, Tengfei Wang, Chunchao Guo
cs.AI

摘要

我們推出HY-World 2.0——一個多模態世界模型框架,該框架在我們先前項目HY-World 1.0的基礎上實現了重要進展。HY-World 2.0能夠兼容多種輸入模態(包括文本提示、單視角圖像、多視角圖像及影片),並生成3D世界表徵。當輸入文本或單視角圖像時,模型可執行世界生成任務,合成具有高擬真度、可導航的3D高斯潑濺(3DGS)場景。這一過程通過四階段方法實現:a) 使用HY-Pano 2.0生成全景圖,b) 通過WorldNav進行軌跡規劃,c) 利用WorldStereo 2.0擴展世界範圍,d) 採用WorldMirror 2.0完成世界合成。具體而言,我們引入了關鍵創新技術以提升全景圖擬真度、實現3D場景理解與規劃,並升級了基於關鍵幀的視圖生成模型WorldStereo(具備一致性記憶機制)。同時,我們通過改進模型架構與學習策略,對通用3D預測的前饋模型WorldMirror進行升級,使其能從多視角圖像或影片中重建世界。此外,我們推出WorldLens高性能3DGS渲染平台,其特點在於採用靈活的引擎無關架構、支持自動圖像照明(IBL)、高效碰撞檢測,以及訓練-渲染協同設計,可實現帶角色支持的交互式3D世界探索。大量實驗表明,在開源方案中,HY-World 2.0於多個基準測試上達到最先進性能,成果媲美閉源模型Marble。我們公開全部模型權重、代碼與技術細節,以促進可重現性並支持3D世界模型的進一步研究。
English
We introduce HY-World 2.0, a multi-modal world model framework that advances our prior project HY-World 1.0. HY-World 2.0 accommodates diverse input modalities, including text prompts, single-view images, multi-view images, and videos, and produces 3D world representations. With text or single-view image inputs, the model performs world generation, synthesizing high-fidelity, navigable 3D Gaussian Splatting (3DGS) scenes. This is achieved through a four-stage method: a) Panorama Generation with HY-Pano 2.0, b) Trajectory Planning with WorldNav, c) World Expansion with WorldStereo 2.0, and d) World Composition with WorldMirror 2.0. Specifically, we introduce key innovations to enhance panorama fidelity, enable 3D scene understanding and planning, and upgrade WorldStereo, our keyframe-based view generation model with consistent memory. We also upgrade WorldMirror, a feed-forward model for universal 3D prediction, by refining model architecture and learning strategy, enabling world reconstruction from multi-view images or videos. Also, we introduce WorldLens, a high-performance 3DGS rendering platform featuring a flexible engine-agnostic architecture, automatic IBL lighting, efficient collision detection, and training-rendering co-design, enabling interactive exploration of 3D worlds with character support. Extensive experiments demonstrate that HY-World 2.0 achieves state-of-the-art performance on several benchmarks among open-source approaches, delivering results comparable to the closed-source model Marble. We release all model weights, code, and technical details to facilitate reproducibility and support further research on 3D world models.
PDF682April 18, 2026