ChatPaper.aiChatPaper

Hy-Embodied-0.5-VLA: De Modelos de Visión-Lenguaje-Acción a un Stack de Aprendizaje Robótico del Mundo Real

Hy-Embodied-0.5-VLA: From Vision-Language-Action Models to a Real-World Robot Learning Stack

June 12, 2026
Autores: He Zhang, Lingzhu Xiang, Haitao Lin, Zeyu Huang, Minghui Wang, Dingyan Zhong, Yubo Dong, Yihao Wu, Yongming Rao, Dongsheng Zhang, Wanjia He, Ling Chen, Kai Huang, Jiahao Chen, Sichang Su, Xumin Yu, Ziyi Wang, Chengwei Zhu, Xiao Teng, Yuchun Guo, Yufeng Zhang, Yuandong Liu, Rui Wang, Zisheng Lu, Han Hu, Zhengyou Zhang
cs.AI

Resumen

En este informe presentamos Hy-Embodied-0.5-VLA, abreviado como HyVLA-0.5, un sistema de extremo a extremo que abarca toda la pila de aprendizaje robótico: recopilación de datos, diseño del modelo, preentrenamiento continuo y ajuste fino supervisado, post-entrenamiento mediante aprendizaje por refuerzo y despliegue en el mundo real. Cada componente desempeña un rol distinto en esta pila.
English
In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.