ChatPaper.aiChatPaper

GigaBrain-0.5M*:基於世界模型強化學習的視覺語言行動模型

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

February 12, 2026
作者: GigaBrain Team, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Hao Li, Jie Li, Jindi Lv, Jingyu Liu, Lv Feng, Mingming Yu, Peng Li, Qiuping Deng, Tianze Liu, Xinyu Zhou, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yifei Nie, Yilong Li, Yukun Zhou, Yun Ye, Zhichao Liu, Zheng Zhu
cs.AI

摘要

直接根據當前觀測預測多步動作塊的視覺-語言-動作模型,因受限的場景理解能力和薄弱的前瞻預測能力而存在固有局限。相比之下,基於網路規模影片語料庫預訓練的影片世界模型展現出強大的時空推理能力與精準的未來預測性能,使其成為增強VLA學習的自然基礎。為此,我們提出GigaBrain-0.5M*——一款通過基於世界模型的強化學習訓練的VLA模型。該模型基於GigaBrain-0.5構建(其預訓練數據包含逾10,000小時機器人操作數據,其中間版本目前位居國際RoboChallenge基準榜首),並進一步通過RAMP(基於世界模型條件策略的強化學習)整合世界模型強化學習機制,實現強健的跨任務適應能力。實證結果表明,RAMP相較RECAP基線取得顯著性能提升,在衣物摺疊、箱體打包與咖啡製作等高難度任務中實現約30%的改進。關鍵在於,GigaBrain-0.5M*展現出可靠的長時程執行能力,經由我們專案頁面(https://gigabrain05m.github.io)展示的實際部署影片驗證,可持續完成複雜操作任務且零失誤。
English
Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robust spatiotemporal reasoning and accurate future prediction, making them a natural foundation for enhancing VLA learning. Therefore, we propose GigaBrain-0.5M*, a VLA model trained via world model-based reinforcement learning. Built upon GigaBrain-0.5, which is pre-trained on over 10,000 hours of robotic manipulation data, whose intermediate version currently ranks first on the international RoboChallenge benchmark. GigaBrain-0.5M* further integrates world model-based reinforcement learning via RAMP (Reinforcement leArning via world Model-conditioned Policy) to enable robust cross-task adaptation. Empirical results demonstrate that RAMP achieves substantial performance gains over the RECAP baseline, yielding improvements of approximately 30\% on challenging tasks including Laundry Folding, Box Packing, and Espresso Preparation. Critically, GigaBrain-0.5M^* exhibits reliable long-horizon execution, consistently accomplishing complex manipulation tasks without failure as validated by real-world deployment videos on our https://gigabrain05m.github.io{project page}.
PDF331February 14, 2026