Hy-Embodied-0.5-VLA: 비전-언어-행동 모델에서 실세계 로봇 학습 스택까지

초록

본 보고서에서는 Hy-Embodied-0.5-VLA (약칭 HyVLA-0.5)를 제시한다. 이는 데이터 수집, 모델 설계, 지속적 사전 훈련 및 지도 미세 조정, 강화학습 후처리, 실제 환경 배포에 이르는 로봇 학습 전체 스택을 아우르는 종단간(end-to-end) 시스템이다. 각 구성 요소는 이 스택에서 고유한 역할을 수행한다.

English

In this report, we present Hy-Embodied-0.5-VLA, abbreviated as HyVLA-0.5, an end-to-end system that spans the full robot learning stack: data collection, model design, continued pre-training and supervised fine-tuning, RL post-training, and real-world deployment. Each component serves a distinct role in this stack.