Hy-Embodied-0.5-VLA: Von Vision-Language-Action-Modellen zu einem realweltlichen Robot Learning Stack

Zusammenfassung

In diesem Bericht stellen wir Hy-Embodied-0.5-VLA, abgekürzt als HyVLA-0.5, vor – ein End-to-End-System, das den gesamten Roboter-Lern-Stack abdeckt: Datenerfassung, Modellentwurf, fortgesetztes Pre-Training und überwachtes Feintuning, RL-Post-Training sowie reale Implementierung. Jede Komponente erfüllt in diesem Stack eine spezifische Funktion.

English

In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.