Hy-Embodied-0.5-VLA: Von Vision-Language-Action-Modellen zu einem realweltlichen Robot Learning Stack
Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack
June 12, 2026
Autoren: He Zhang, Lingzhu Xiang, Haitao Lin, Zeyu Huang, Minghui Wang, Dingyan Zhong, Yubo Dong, Yihao Wu, Yongming Rao, Dongsheng Zhang, Wanjia He, Ling Chen, Kai Huang, Jiahao Chen, Sichang Su, Xumin Yu, Ziyi Wang, Chengwei Zhu, Xiao Teng, Yuchun Guo, Yufeng Zhang, Yuandong Liu, Rui Wang, Zisheng Lu, Han Hu, Zhengyou Zhang
cs.AI
Zusammenfassung
In diesem Bericht stellen wir Hy-Embodied-0.5-VLA, abgekürzt als HyVLA-0.5, vor – ein End-to-End-System, das den gesamten Roboter-Lern-Stack abdeckt: Datenerfassung, Modellentwurf, fortgesetztes Pre-Training und überwachtes Feintuning, RL-Post-Training sowie reale Implementierung. Jede Komponente erfüllt in diesem Stack eine spezifische Funktion.
English
In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.