ChatPaper.aiChatPaper

VLA-0:无需改造构建尖端可变长数组技术

VLA-0: Building State-of-the-Art VLAs with Zero Modification

October 15, 2025
作者: Ankit Goyal, Hugo Hadfield, Xuning Yang, Valts Blukis, Fabio Ramos
cs.AI

摘要

視覺-語言-動作模型(VLAs)在實現通用機器人操作方面展現出巨大潛力。然而,構建此類模型的最佳途徑仍是一個未解之謎。現有方法往往增加複雜性,例如通過動作標記修改視覺-語言模型(VLM)的現有詞彙,或引入專門的動作頭部。有趣的是,將動作直接以文本形式表示的最簡策略卻鮮有探索。本研究引入VLA-0來探討這一理念。我們發現,VLA-0不僅有效,而且其能力出人意料地強大。通過恰當的設計,VLA-0超越了更為複雜的模型。在評估VLAs的熱門基準LIBERO上,VLA-0在相同機器人數據訓練下,超越了包括pi_0.5-KI、OpenVLA-OFT和SmolVLA在內的所有現有方法。此外,無需大規模機器人專用訓練,它便超越了基於大規模機器人數據訓練的方法,如pi_0.5-KI、pi_0、GR00T-N1和MolmoAct。這些發現同樣適用於現實世界,VLA-0在預訓練於大規模真實數據的VLA模型SmolVLA之上表現更優。本文總結了我們這些出乎意料的發現,並闡明了釋放這一簡潔而強大VLA設計高性能所需的具體技術。視覺結果、代碼及訓練模型可在此獲取:https://vla0.github.io/。
English
Vision-Language-Action models (VLAs) hold immense promise for enabling generalist robot manipulation. However, the best way to build them remains an open question. Current approaches often add complexity, such as modifying the existing vocabulary of a Vision-Language Model (VLM) with action tokens or introducing special action heads. Curiously, the simplest strategy of representing actions directly as text has remained largely unexplored. This work introduces VLA-0 to investigate this idea. We find that VLA-0 is not only effective; it is surprisingly powerful. With the right design, VLA-0 outperforms more involved models. On LIBERO, a popular benchmark for evaluating VLAs, VLA-0 outperforms all existing methods trained on the same robotic data, including pi_0.5-KI, OpenVLA-OFT and SmolVLA. Furthermore, without large-scale robotics-specific training, it outperforms methods trained on large-scale robotic data, like pi_0.5-KI, pi_0, GR00T-N1 and MolmoAct. These findings also translate to the real world, where VLA-0 outperforms SmolVLA, a VLA model pre-trained on large-scale real data. This paper summarizes our unexpected findings and spells out the specific techniques required to unlock the high performance of this simple yet potent VLA design. Visual results, code, and trained models are provided here: https://vla0.github.io/.
PDF123December 21, 2025