OpenHelix：機器人操作之簡要綜述、實證分析與開源雙系統VLA模型

摘要

雙系統視覺-語言-動作（VLA）架構已成為具身智能研究的熱點，但現有開源工作尚不足以支持進一步的性能分析與優化。為解決這一問題，本文將總結並比較現有雙系統架構的結構設計，並對其核心設計要素進行系統性的實證評估。最終，本文將提供一個低成本的開源模型，以供進一步探索。當然，該項目將持續更新，提供更多實驗結論及性能更優的開源模型供大家選擇。項目頁面：https://openhelix-robot.github.io/。

English

Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem, this paper will summarize and compare the structural designs of existing dual-system architectures, and conduct systematic empirical evaluations on the core design elements of existing dual-system architectures. Ultimately, it will provide a low-cost open-source model for further exploration. Of course, this project will continue to update with more experimental conclusions and open-source models with improved performance for everyone to choose from. Project page: https://openhelix-robot.github.io/.