ChatPaper.aiChatPaper

GigaWorld-0:以世界模型為數據引擎,賦能具身人工智慧

GigaWorld-0: World Models as Data Engine to Empower Embodied AI

November 25, 2025
作者: GigaWorld Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jiagang Zhu, Kerui Li, Mengyuan Xu, Qiuping Deng, Siting Wang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yankai Wang, Yu Cao, Yifan Chang, Yuan Xu, Yun Ye, Yang Wang, Yukun Zhou, Zhengyuan Zhang, Zhehao Dong, Zheng Zhu
cs.AI

摘要

世界模型正逐漸成為可擴展、數據高效具身人工智慧的基礎範式。本研究提出GigaWorld-0——一個專為視覺-語言-動作學習設計的統一世界模型框架,其核心定位是作為數據引擎。該框架整合兩大協同組件:GigaWorld-0-Video通過大規模視頻生成技術,在外觀、相機視角與動作語義的細粒度控制下,產生多樣化、紋理豐富且時序連貫的具身序列;GigaWorld-0-3D則融合三維生成建模、3D高斯潑濺重建、物理可微分系統辨識與可執行運動規劃,確保幾何一致性與物理真實性。兩者的聯合優化實現了視覺吸引力、空間連貫性、物理合理性與指令對齊的具身交互數據規模化合成。我們開發的高效GigaTrain框架採用FP8精度與稀疏注意力機制,大幅降低記憶體與計算需求,使大規模訓練成為可能。綜合評估表明,GigaWorld-0能在多維度生成高質量、多樣化且可控的數據。關鍵在於,基於GigaWorld-0生成數據訓練的VLA模型(如GigaBrain-0)在實體機器人上展現出卓越的泛化能力與任務成功率,且訓練過程完全無需真實世界交互數據。
English
World models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: GigaWorld-0-Video, which leverages large-scale video generation to produce diverse, texture-rich, and temporally coherent embodied sequences under fine-grained control of appearance, camera viewpoint, and action semantics; and GigaWorld-0-3D, which combines 3D generative modeling, 3D Gaussian Splatting reconstruction, physically differentiable system identification, and executable motion planning to ensure geometric consistency and physical realism. Their joint optimization enables the scalable synthesis of embodied interaction data that is visually compelling, spatially coherent, physically plausible, and instruction-aligned. Training at scale is made feasible through our efficient GigaTrain framework, which exploits FP8-precision and sparse attention to drastically reduce memory and compute requirements. We conduct comprehensive evaluations showing that GigaWorld-0 generates high-quality, diverse, and controllable data across multiple dimensions. Critically, VLA model (e.g., GigaBrain-0) trained on GigaWorld-0-generated data achieve strong real-world performance, significantly improving generalization and task success on physical robots without any real-world interaction during training.
PDF286December 1, 2025