ChatPaper.aiChatPaper

GigaBrain-0:一个基于世界模型的视觉-语言-动作模型

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

October 22, 2025
作者: GigaBrain Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jie Li, Jiagang Zhu, Lv Feng, Peng Li, Qiuping Deng, Runqi Ouyang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yilong Li, Yiran Ding, Yuan Xu, Yun Ye, Yukun Zhou, Zhehao Dong, Zhenan Wang, Zhichao Liu, Zheng Zhu
cs.AI

摘要

训练通用型机器人的视觉-语言-动作(VLA)模型通常需要大规模的真实世界机器人数据,这些数据的收集既昂贵又耗时。物理数据收集的低效性严重限制了当前VLA系统的可扩展性和泛化能力。为解决这一挑战,我们推出了GigaBrain-0,一种基于世界模型生成数据(如视频生成、真实到真实转换、人类动作迁移、视角转换、仿真到真实转换数据)的新型VLA基础模型。通过利用世界模型大规模生成多样化数据,GigaBrain-0显著减少了对真实机器人数据的依赖,同时提升了跨任务泛化能力。我们的方法进一步通过RGBD输入建模和具身链式思维(CoT)监督增强了策略的鲁棒性,使模型能够在任务执行过程中推理空间几何、物体状态及长期依赖关系。这带来了在灵巧操作、长期规划和移动操控任务上现实世界性能的显著提升。大量实验表明,GigaBrain-0在外观(如纹理、颜色)、物体摆放和相机视角变化方面展现出卓越的泛化能力。此外,我们还推出了GigaBrain-0-Small,一个优化后的轻量级版本,专为在NVIDIA Jetson AGX Orin等设备上高效运行而设计。
English
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.
PDF304October 23, 2025