VLA-R1:提升视觉-语言-动作模型中的推理能力
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models
October 2, 2025
作者: Angen Ye, Zeyu Zhang, Boyuan Wang, Xiaofeng Wang, Dapeng Zhang, Zheng Zhu
cs.AI
摘要
视觉-语言-行动(VLA)模型旨在统一感知、语言理解与行动生成,展现出强大的跨任务与跨场景泛化能力,对具身智能领域具有广泛影响。然而,当前的VLA模型往往缺乏明确的逐步推理过程,直接输出最终行动而忽视了可供性约束与几何关系。其训练后流程也鲜少强化推理质量,主要依赖于监督微调与弱奖励设计。为应对这些挑战,我们提出了VLA-R1,一种增强推理的VLA模型,它结合了可验证奖励的强化学习(RLVR)与群体相对策略优化(GRPO),系统性地优化推理与执行。具体而言,我们设计了一种基于RLVR的训练后策略,通过区域对齐、轨迹一致性和输出格式化的可验证奖励,从而增强推理的鲁棒性与执行的准确性。此外,我们开发了VLA-CoT-13K,一个高质量数据集,提供了与可供性和轨迹注释明确对齐的思维链监督。进一步地,在领域内、领域外、仿真及真实机器人平台上的广泛评估表明,VLA-R1相较于先前的VLA方法,实现了更优的泛化能力与真实世界性能。我们计划在本文发表后公开模型、代码及数据集。代码:https://github.com/GigaAI-research/VLA-R1。网站:https://gigaai-research.github.io/VLA-R1。
English
Vision-Language-Action (VLA) models aim to unify perception, language
understanding, and action generation, offering strong cross-task and
cross-scene generalization with broad impact on embodied AI. However, current
VLA models often lack explicit step-by-step reasoning, instead emitting final
actions without considering affordance constraints or geometric relations.
Their post-training pipelines also rarely reinforce reasoning quality, relying
primarily on supervised fine-tuning with weak reward design. To address these
challenges, we present VLA-R1, a reasoning-enhanced VLA that integrates
Reinforcement Learning from Verifiable Rewards (RLVR) with Group Relative
Policy Optimization (GRPO) to systematically optimize both reasoning and
execution. Specifically, we design an RLVR-based post-training strategy with
verifiable rewards for region alignment, trajectory consistency, and output
formatting, thereby strengthening reasoning robustness and execution accuracy.
Moreover, we develop VLA-CoT-13K, a high-quality dataset that provides
chain-of-thought supervision explicitly aligned with affordance and trajectory
annotations. Furthermore, extensive evaluations on in-domain, out-of-domain,
simulation, and real-robot platforms demonstrate that VLA-R1 achieves superior
generalization and real-world performance compared to prior VLA methods. We
plan to release the model, code, and dataset following the publication of this
work. Code: https://github.com/GigaAI-research/VLA-R1. Website:
https://gigaai-research.github.io/VLA-R1.