We-Math 2.0:激勵視覺數學推理的多功能數學書系統
We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning
August 14, 2025
作者: Runqi Qiao, Qiuna Tan, Peiqing Yang, Yanzi Wang, Xiaowan Wang, Enhui Wan, Sitong Zhou, Guanting Dong, Yuchen Zeng, Yida Xu, Jie Wang, Chong Sun, Chen Li, Honggang Zhang
cs.AI
摘要
多模態大型語言模型(MLLMs)在各種任務中展現了令人印象深刻的能力,但在複雜的數學推理方面仍存在困難。現有研究主要集中於數據集構建和方法優化,往往忽略了兩個關鍵方面:全面的知識驅動設計和以模型為中心的數據空間建模。本文介紹了We-Math 2.0,這是一個統一的系統,整合了結構化的數學知識體系、以模型為中心的數據空間建模以及基於強化學習(RL)的訓練範式,以全面增強MLLMs的數學推理能力。We-Math 2.0的主要貢獻包括以下四點:(1) MathBook知識體系:我們構建了一個五層次的層級系統,涵蓋491個知識點和1,819個基本原理。(2) MathBook-Standard & Pro:我們開發了MathBook-Standard,這是一個通過雙重擴展確保廣泛概念覆蓋和靈活性的數據集。此外,我們定義了一個三維難度空間,並為每個問題生成7個漸進變體,以構建MathBook-Pro,這是一個用於穩健訓練的挑戰性數據集。(3) MathBook-RL:我們提出了一個兩階段的RL框架,包括:(i) 冷啟動微調,使模型與知識導向的思維鏈推理對齊;以及(ii) 漸進對齊RL,利用平均獎勵學習和動態數據調度,實現跨難度層次的漸進對齊。(4) MathBookEval:我們引入了一個全面的基準測試,涵蓋所有491個知識點,並具有多樣化的推理步驟分佈。實驗結果表明,MathBook-RL在四個廣泛使用的基準測試上與現有基線競爭激烈,並在MathBookEval上取得了強勁的結果,顯示出在數學推理方面具有潛在的泛化能力。
English
Multimodal Large Language Models (MLLMs) have demonstrated impressive
capabilities across various tasks, but still struggle with complex mathematical
reasoning. Existing research primarily focuses on dataset construction and
method optimization, often overlooking two critical aspects: comprehensive
knowledge-driven design and model-centric data space modeling. In this paper,
we introduce We-Math 2.0, a unified system that integrates a structured
mathematical knowledge system, model-centric data space modeling, and a
reinforcement learning (RL)-based training paradigm to comprehensively enhance
the mathematical reasoning abilities of MLLMs. The key contributions of We-Math
2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level
hierarchical system encompassing 491 knowledge points and 1,819 fundamental
principles. (2) MathBook-Standard & Pro: We develop MathBook-Standard, a
dataset that ensures broad conceptual coverage and flexibility through dual
expansion. Additionally, we define a three-dimensional difficulty space and
generate 7 progressive variants per problem to build MathBook-Pro, a
challenging dataset for robust training. (3) MathBook-RL: We propose a
two-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns the
model with knowledge-oriented chain-of-thought reasoning; and (ii) Progressive
Alignment RL, leveraging average-reward learning and dynamic data scheduling to
achieve progressive alignment across difficulty levels. (4) MathBookEval: We
introduce a comprehensive benchmark covering all 491 knowledge points with
diverse reasoning step distributions. Experimental results show that
MathBook-RL performs competitively with existing baselines on four widely-used
benchmarks and achieves strong results on MathBookEval, suggesting promising
generalization in mathematical reasoning.