Vero：面向通用视觉推理的开放式强化学习方案

摘要

要构建一个能够横跨图表解析、科学推理、空间理解及开放式任务的视觉推理系统需要哪些要素？当前最强的视觉语言模型（VLMs）已展现出实现这种广义视觉推理的潜力，但其构建方法仍不明确，被封闭的强化学习（RL）流程与非公开数据所垄断。我们推出Vero系列——一组完全开源的VLM模型，其在多样化视觉推理任务中的表现达到或超越了现有开源模型。我们通过扩展六大任务类别的RL数据与奖励机制，构建了包含59个数据集、60万样本规模的Vero-600K数据集，并设计了能处理异构答案格式的任务路由奖励函数。在包含30个挑战性基准测试的VeroEval评估体系中，Vero实现了最先进的性能，相较四个基础模型平均提升3.7-5.5个百分点。以Qwen3-VL-8B-Instruct为基础模型时，Vero在30个基准测试中的23个上超越了未使用额外专有思维数据的Qwen3-VL-8B-Thinking模型。当基于同一基础模型训练时，Vero-600K在所有任务类别上均优于现有RL数据集。系统性消融实验表明，不同任务类别会引发性质各异的推理模式，这些模式在孤立训练时迁移效果较差，说明广泛的数据覆盖才是推动RL扩展性能的主要驱动力。我们已全面公开所有数据、代码与模型。

English

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) show such broad visual reasoning is within reach, but the recipe behind them remains unclear, locked behind proprietary reinforcement learning (RL) pipelines with non-public data. We introduce Vero, a family of fully open VLMs that matches or exceeds existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answer formats. Vero achieves state-of-the-art performance, improving over four base models by 3.7-5.5 points on average across VeroEval, our suite of 30 challenging benchmarks. Starting from Qwen3-VL-8B-Instruct, Vero outperforms Qwen3-VL-8B-Thinking on 23 of 30 benchmarks without additional proprietary thinking data. When trained from the same base model, Vero-600K exceeds existing RL datasets across task categories. Systematic ablations reveal that different task categories elicit qualitatively distinct reasoning patterns that transfer poorly in isolation, suggesting that broad data coverage is the primary driver of strong RL scaling. All data, code, and models are released.