ChatPaper.aiChatPaper

GigaBrain-0:一個由世界模型驅動的視覺-語言-行動模型

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

October 22, 2025
作者: GigaBrain Team, Angen Ye, Boyuan Wang, Chaojun Ni, Guan Huang, Guosheng Zhao, Haoyun Li, Jie Li, Jiagang Zhu, Lv Feng, Peng Li, Qiuping Deng, Runqi Ouyang, Wenkang Qin, Xinze Chen, Xiaofeng Wang, Yang Wang, Yifan Li, Yilong Li, Yiran Ding, Yuan Xu, Yun Ye, Yukun Zhou, Zhehao Dong, Zhenan Wang, Zhichao Liu, Zheng Zhu
cs.AI

摘要

訓練通用機器人的視覺-語言-動作(VLA)模型通常需要大規模的真實世界機器人數據,這些數據的收集既昂貴又耗時。物理數據收集的低效性嚴重限制了當前VLA系統的可擴展性和泛化能力。為應對這一挑戰,我們引入了GigaBrain-0,這是一種新型的VLA基礎模型,其能力來自於世界模型生成的數據(例如,視頻生成、真實到真實轉移、人類轉移、視角轉移、模擬到真實轉移數據)。通過利用世界模型大規模生成多樣化數據,GigaBrain-0顯著減少了對真實機器人數據的依賴,同時提升了跨任務的泛化能力。我們的方法通過RGBD輸入建模和具身思維鏈(CoT)監督進一步提升了策略的魯棒性,使模型在任務執行過程中能夠推理空間幾何、物體狀態和長期依賴關係。這在靈巧操作、長期規劃和移動操作任務的實際表現中帶來了顯著的提升。大量實驗表明,GigaBrain-0在外觀(例如,紋理、顏色)、物體擺放和攝像機視角變化方面展現出卓越的泛化能力。此外,我們還推出了GigaBrain-0-Small,這是一個優化的輕量級變體,專為在NVIDIA Jetson AGX Orin等設備上高效運行而設計。
English
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.
PDF304October 23, 2025