Vero：面向通用视觉推理的开放式强化学习方案

摘要

要构建一个能够跨图表解析、科学推理、空间理解及开放式任务运行的视觉推理系统，需要哪些要素？当前最强的视觉语言模型（VLMs）表明，实现如此广泛的视觉推理能力已触手可及，但其背后的技术方案仍不明确——它们被封锁在采用非公开数据的专有强化学习（RL）流程中。我们推出Vero系列模型，这是一套完全开源的VLMs，在多样化的视觉推理任务中表现媲美甚至超越现有开源模型。我们通过扩展六大任务类别的RL数据与奖励机制，构建了包含59个数据集、60万样本规模的Vero-600K数据集，并设计了能处理异构答案格式的任务路由奖励方案。Vero在包含30项挑战性基准测试的VeroEval套件中实现领先性能，相较四个基础模型平均提升3.7-5.5分。以Qwen3-VL-8B-Instruct为基础模型时，Vero在30项基准测试中的23项上超越Qwen3-VL-8B-Thinking，且无需额外专有思维数据。当基于同一基础模型训练时，Vero-600K在所有任务类别上均优于现有RL数据集。系统性消融实验表明，不同任务类别会引发性质各异的推理模式，这些模式在孤立训练时迁移效果较差，说明广泛的数据覆盖才是强化学习规模化的核心驱动力。我们已全面公开所有数据、代码与模型。

English

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pipelines with non-public data. We introduce Vero, a family of fully open VLMs that matches or exceeds existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answer formats. Vero achieves state-of-the-art performance, improving over four base models by 3.7-5.5 points on average across VeroEval, our suite of 30 challenging benchmarks. Starting from Qwen3-VL-8B-Instruct, Vero outperforms Qwen3-VL-8B-Thinking on 23 of 30 benchmarks without additional proprietary thinking data. When trained from the same base model, Vero-600K exceeds existing RL datasets across task categories. Systematic ablations reveal that different task categories elicit qualitatively distinct reasoning patterns that transfer poorly in isolation, suggesting that broad data coverage is the primary driver of strong RL scaling. All data, code, and models are released.

Vero：面向通用视觉推理的开放式强化学习方案

Vero: An Open RL Recipe for General Visual Reasoning

摘要

Support