ChatPaper.aiChatPaper

GigaWorld-0:以世界模型为数据引擎,赋能具身人工智能

GigaWorld-0: World Models as Data Engine to Empower Embodied AI

November 25, 2025
作者: GigaWorld Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jiagang Zhu, Kerui Li, Mengyuan Xu, Qiuping Deng, Siting Wang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yankai Wang, Yu Cao, Yifan Chang, Yuan Xu, Yun Ye, Yang Wang, Yukun Zhou, Zhengyuan Zhang, Zhehao Dong, Zheng Zhu
cs.AI

摘要

世界模型正逐渐成为可扩展、数据高效具身AI的基础范式。本研究提出GigaWorld-0——一个专为视觉-语言-动作(VLA)学习设计的数据引擎式统一世界模型框架。该框架包含两个协同组件:GigaWorld-0-Video通过大规模视频生成,在外观、摄像机视角和动作语义的细粒度控制下,生成多样化、纹理丰富且时序连贯的具身序列;GigaWorld-0-3D则融合三维生成建模、3D高斯泼溅重建、物理可微系统辨识与可执行运动规划,确保几何一致性与物理真实性。二者的联合优化实现了视觉吸引力、空间一致性、物理合理性与指令对齐的具身交互数据规模化合成。通过我们高效的GigaTrain框架(利用FP8精度与稀疏注意力显著降低内存与计算需求),实现了大规模训练的可行性。综合评估表明,GigaWorld-0能在多维度生成高质量、多样化且可控的数据。关键的是,基于GigaWorld-0生成数据训练的VLA模型(如GigaBrain-0)在现实场景中表现卓越,无需任何真实世界交互训练即可显著提升物理机器人的泛化能力与任务成功率。
English
World models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: GigaWorld-0-Video, which leverages large-scale video generation to produce diverse, texture-rich, and temporally coherent embodied sequences under fine-grained control of appearance, camera viewpoint, and action semantics; and GigaWorld-0-3D, which combines 3D generative modeling, 3D Gaussian Splatting reconstruction, physically differentiable system identification, and executable motion planning to ensure geometric consistency and physical realism. Their joint optimization enables the scalable synthesis of embodied interaction data that is visually compelling, spatially coherent, physically plausible, and instruction-aligned. Training at scale is made feasible through our efficient GigaTrain framework, which exploits FP8-precision and sparse attention to drastically reduce memory and compute requirements. We conduct comprehensive evaluations showing that GigaWorld-0 generates high-quality, diverse, and controllable data across multiple dimensions. Critically, VLA model (e.g., GigaBrain-0) trained on GigaWorld-0-generated data achieve strong real-world performance, significantly improving generalization and task success on physical robots without any real-world interaction during training.
PDF286December 1, 2025