We-Math 2.0:激励视觉数学推理的多功能数学书系统
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
August 14, 2025
作者: Runqi Qiao, Qiuna Tan, Peiqing Yang, Yanzi Wang, Xiaowan Wang, Enhui Wan, Sitong Zhou, Guanting Dong, Yuchen Zeng, Yida Xu, Jie Wang, Chong Sun, Chen Li, Honggang Zhang
cs.AI
摘要
多模态大语言模型(MLLMs)已在多种任务中展现出卓越能力,但在复杂数学推理方面仍面临挑战。现有研究主要集中于数据集构建与方法优化,往往忽视了两个关键方面:全面的知识驱动设计与以模型为中心的数据空间建模。本文提出We-Math 2.0,一个统一系统,整合了结构化数学知识体系、以模型为中心的数据空间建模及基于强化学习(RL)的训练范式,旨在全面提升MLLMs的数学推理能力。We-Math 2.0的核心贡献包括四点:(1)MathBook知识体系:构建了一个五级层次结构,涵盖491个知识点与1819条基本原理。(2)MathBook-Standard与Pro:开发了MathBook-Standard数据集,通过双重扩展确保广泛概念覆盖与灵活性;同时,定义三维难度空间,为每个问题生成7个渐进变体,构建出MathBook-Pro,一个用于稳健训练的高难度数据集。(3)MathBook-RL:提出两阶段RL框架,包括:(i)冷启动微调,使模型与知识导向的思维链推理对齐;(ii)渐进对齐RL,利用平均奖励学习与动态数据调度,实现跨难度级别的渐进对齐。(4)MathBookEval:引入一个全面基准,覆盖所有491个知识点,并具备多样化的推理步骤分布。实验结果显示,MathBook-RL在四个广泛使用的基准测试中与现有基线竞争激烈,并在MathBookEval上取得优异成绩,表明其在数学推理方面具有良好的泛化潜力。
English
Multimodal Large Language Models (MLLMs) have demonstrated impressive
capabilities across various tasks, but still struggle with complex mathematical
reasoning. Existing research primarily focuses on dataset construction and
method optimization, often overlooking two critical aspects: comprehensive
knowledge-driven design and model-centric data space modeling. In this paper,
we introduce We-Math 2.0, a unified system that integrates a structured
mathematical knowledge system, model-centric data space modeling, and a
reinforcement learning (RL)-based training paradigm to comprehensively enhance
the mathematical reasoning abilities of MLLMs. The key contributions of We-Math
2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level
hierarchical system encompassing 491 knowledge points and 1,819 fundamental
principles. (2) MathBook-Standard & Pro: We develop MathBook-Standard, a
dataset that ensures broad conceptual coverage and flexibility through dual
expansion. Additionally, we define a three-dimensional difficulty space and
generate 7 progressive variants per problem to build MathBook-Pro, a
challenging dataset for robust training. (3) MathBook-RL: We propose a
two-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns the
model with knowledge-oriented chain-of-thought reasoning; and (ii) Progressive
Alignment RL, leveraging average-reward learning and dynamic data scheduling to
achieve progressive alignment across difficulty levels. (4) MathBookEval: We
introduce a comprehensive benchmark covering all 491 knowledge points with
diverse reasoning step distributions. Experimental results show that
MathBook-RL performs competitively with existing baselines on four widely-used
benchmarks and achieves strong results on MathBookEval, suggesting promising
generalization in mathematical reasoning.