VLA-R1：提升视觉-语言-动作模型中的推理能力

摘要

视觉-语言-行动（VLA）模型旨在统一感知、语言理解与行动生成，展现出强大的跨任务与跨场景泛化能力，对具身智能领域具有广泛影响。然而，当前的VLA模型往往缺乏明确的逐步推理过程，直接输出最终行动而忽视了可供性约束与几何关系。其训练后流程也鲜少强化推理质量，主要依赖于监督微调与弱奖励设计。为应对这些挑战，我们提出了VLA-R1，一种增强推理的VLA模型，它结合了可验证奖励的强化学习（RLVR）与群体相对策略优化（GRPO），系统性地优化推理与执行。具体而言，我们设计了一种基于RLVR的训练后策略，通过区域对齐、轨迹一致性和输出格式化的可验证奖励，从而增强推理的鲁棒性与执行的准确性。此外，我们开发了VLA-CoT-13K，一个高质量数据集，提供了与可供性和轨迹注释明确对齐的思维链监督。进一步地，在领域内、领域外、仿真及真实机器人平台上的广泛评估表明，VLA-R1相较于先前的VLA方法，实现了更优的泛化能力与真实世界性能。我们计划在本文发表后公开模型、代码及数据集。代码：https://github.com/GigaAI-research/VLA-R1。网站：https://gigaai-research.github.io/VLA-R1。

English

Vision-Language-Action (VLA) models aim to unify perception, language understanding, and action generation, offering strong cross-task and cross-scene generalization with broad impact on embodied AI. However, current VLA models often lack explicit step-by-step reasoning, instead emitting final actions without considering affordance constraints or geometric relations. Their post-training pipelines also rarely reinforce reasoning quality, relying primarily on supervised fine-tuning with weak reward design. To address these challenges, we present VLA-R1, a reasoning-enhanced VLA that integrates Reinforcement Learning from Verifiable Rewards (RLVR) with Group Relative Policy Optimization (GRPO) to systematically optimize both reasoning and execution. Specifically, we design an RLVR-based post-training strategy with verifiable rewards for region alignment, trajectory consistency, and output formatting, thereby strengthening reasoning robustness and execution accuracy. Moreover, we develop VLA-CoT-13K, a high-quality dataset that provides chain-of-thought supervision explicitly aligned with affordance and trajectory annotations. Furthermore, extensive evaluations on in-domain, out-of-domain, simulation, and real-robot platforms demonstrate that VLA-R1 achieves superior generalization and real-world performance compared to prior VLA methods. We plan to release the model, code, and dataset following the publication of this work. Code: https://github.com/GigaAI-research/VLA-R1. Website: https://gigaai-research.github.io/VLA-R1.

VLA-R1：提升视觉-语言-动作模型中的推理能力

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

摘要

Support