GigaBrain-0.5M*:基于世界模型强化学习的大语言视觉模型
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning
February 12, 2026
作者: GigaBrain Team, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Hao Li, Jie Li, Jindi Lv, Jingyu Liu, Lv Feng, Mingming Yu, Peng Li, Qiuping Deng, Tianze Liu, Xinyu Zhou, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yifei Nie, Yilong Li, Yukun Zhou, Yun Ye, Zhichao Liu, Zheng Zhu
cs.AI
摘要
直接根据当前观测预测多步动作块的视觉-语言-动作(VLA)模型,因场景理解受限和未来预测能力薄弱而存在固有局限。相比之下,基于海量视频数据预训练的视频世界模型展现出强大的时空推理与精准的未来预测能力,自然成为增强VLA学习的理想基础。为此,我们提出GigaBrain-0.5M*——一款通过世界模型强化学习训练的VLA模型。该模型基于在超1万小时机器人操作数据上预训练的GigaBrain-0.5(其中间版本目前在国际RoboChallenge基准中排名第一),进一步通过RAMP(基于世界模型条件策略的强化学习)框架融合世界模型强化学习,实现稳健的跨任务适应能力。实验表明,RAMP相较RECAP基线取得显著性能提升,在叠衣服、装箱和意式咖啡制备等高难度任务中性能提高约30%。关键的是,GigaBrain-0.5M*展现出可靠的长周期执行能力,如我们项目页面https://gigabrain05m.github.io 上的实机部署视频所验证,该模型能持续完成复杂操作任务且零失败率。
English
Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robust spatiotemporal reasoning and accurate future prediction, making them a natural foundation for enhancing VLA learning. Therefore, we propose GigaBrain-0.5M*, a VLA model trained via world model-based reinforcement learning. Built upon GigaBrain-0.5, which is pre-trained on over 10,000 hours of robotic manipulation data, whose intermediate version currently ranks first on the international RoboChallenge benchmark. GigaBrain-0.5M* further integrates world model-based reinforcement learning via RAMP (Reinforcement leArning via world Model-conditioned Policy) to enable robust cross-task adaptation. Empirical results demonstrate that RAMP achieves substantial performance gains over the RECAP baseline, yielding improvements of approximately 30\% on challenging tasks including Laundry Folding, Box Packing, and Espresso Preparation. Critically, GigaBrain-0.5M^* exhibits reliable long-horizon execution, consistently accomplishing complex manipulation tasks without failure as validated by real-world deployment videos on our https://gigabrain05m.github.io{project page}.