Hy-Embodied-0.5-VLA:從視覺-語言-行動模型到現實世界機器人學習堆疊
Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack
June 12, 2026
作者: He Zhang, Lingzhu Xiang, Haitao Lin, Zeyu Huang, Minghui Wang, Dingyan Zhong, Yubo Dong, Yihao Wu, Yongming Rao, Dongsheng Zhang, Wanjia He, Ling Chen, Kai Huang, Jiahao Chen, Sichang Su, Xumin Yu, Ziyi Wang, Chengwei Zhu, Xiao Teng, Yuchun Guo, Yufeng Zhang, Yuandong Liu, Rui Wang, Zisheng Lu, Han Hu, Zhengyou Zhang
cs.AI
摘要
在本報告中,我們介紹了 Hy-Embodied-0.5-VLA,簡稱 HyVLA-0.5,這是一個涵蓋完整機器人學習技術棧的端到端系統,包括:資料收集、模型設計、持續預訓練與監督微調、RL 後訓練以及實際部署。每個組件在此技術棧中各自扮演獨特角色。
English
In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.