We-Math 2.0: 視覚的数学的推論を促進するための多機能MathBookシステム

要旨

マルチモーダル大規模言語モデル（MLLMs）は、さまざまなタスクにおいて印象的な能力を発揮していますが、複雑な数学的推論には依然として苦戦しています。既存の研究は主にデータセットの構築と手法の最適化に焦点を当てており、包括的な知識駆動設計とモデル中心のデータ空間モデリングという2つの重要な側面を見落としがちです。本論文では、We-Math 2.0を紹介します。これは、構造化された数学的知識システム、モデル中心のデータ空間モデリング、および強化学習（RL）ベースのトレーニングパラダイムを統合した統一システムであり、MLLMsの数学的推論能力を包括的に強化します。We-Math 2.0の主な貢献は以下の4点です：（1）MathBook知識システム：491の知識ポイントと1,819の基本原理を網羅した5段階の階層システムを構築します。（2）MathBook-Standard & Pro：MathBook-Standardは、二重拡張を通じて広範な概念カバレッジと柔軟性を確保するデータセットです。さらに、3次元の難易度空間を定義し、各問題に対して7つの段階的バリアントを生成して、堅牢なトレーニングのための挑戦的なデータセットであるMathBook-Proを構築します。（3）MathBook-RL：2段階のRLフレームワークを提案します：（i）コールドスタート微調整：モデルを知識指向の連鎖的思考推論に適合させます。（ii）段階的アライメントRL：平均報酬学習と動的データスケジューリングを活用して、難易度レベル全体で段階的なアライメントを実現します。（4）MathBookEval：491の知識ポイントを網羅し、多様な推論ステップ分布を持つ包括的なベンチマークを導入します。実験結果は、MathBook-RLが4つの広く使用されているベンチマークで既存のベースラインと競争力を持ち、MathBookEvalで強い結果を達成し、数学的推論における有望な一般化を示唆しています。

English

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing research primarily focuses on dataset construction and method optimization, often overlooking two critical aspects: comprehensive knowledge-driven design and model-centric data space modeling. In this paper, we introduce We-Math 2.0, a unified system that integrates a structured mathematical knowledge system, model-centric data space modeling, and a reinforcement learning (RL)-based training paradigm to comprehensively enhance the mathematical reasoning abilities of MLLMs. The key contributions of We-Math 2.0 are fourfold: (1) MathBook Knowledge System: We construct a five-level hierarchical system encompassing 491 knowledge points and 1,819 fundamental principles. (2) MathBook-Standard & Pro: We develop MathBook-Standard, a dataset that ensures broad conceptual coverage and flexibility through dual expansion. Additionally, we define a three-dimensional difficulty space and generate 7 progressive variants per problem to build MathBook-Pro, a challenging dataset for robust training. (3) MathBook-RL: We propose a two-stage RL framework comprising: (i) Cold-Start Fine-tuning, which aligns the model with knowledge-oriented chain-of-thought reasoning; and (ii) Progressive Alignment RL, leveraging average-reward learning and dynamic data scheduling to achieve progressive alignment across difficulty levels. (4) MathBookEval: We introduce a comprehensive benchmark covering all 491 knowledge points with diverse reasoning step distributions. Experimental results show that MathBook-RL performs competitively with existing baselines on four widely-used benchmarks and achieves strong results on MathBookEval, suggesting promising generalization in mathematical reasoning.

We-Math 2.0: 視覚的数学的推論を促進するための多機能MathBookシステム

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

要旨

Support