Hy-Embodied-0.5-VLA: 視覚言語行動モデルから実世界ロボット学習スタックへ

要旨

本報告では、ロボット学習スタック全体をカバーするエンドツーエンドシステムであるHy-Embodied-0.5-VLA（略称HyVLA-0.5）を提示する。このシステムは、データ収集、モデル設計、継続事前学習と教師ありファインチューニング、強化学習によるポストトレーニング、そして実環境への展開を含む。各コンポーネントは、このスタック内で明確な役割を果たす。

English

In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.