複雑な数学的推論のモデリング：大規模言語モデルベースのMathAgent

要旨

大規模言語モデル（LLM）は、文の解析、ドメイン知識の関連付け、複合的な論理的推論、および中間的な根拠の統合といった包括的な能力を必要とする複雑な数学的問題を解決する上で課題に直面しています。これらの問題を一度に解決しようとすると、LLMにとっては困難であり、生成過程での混乱を引き起こす可能性があります。本研究では、数学的推論プロセスを緻密に分解しモデル化することで、エージェントを用いてLLMを強化する可能性を探ります。具体的には、数学的問題解決の形式的な記述を提案し、Planner-Reasoner-Executor-Reflector（PRER）というエージェントベースのゼロショットフレームワークをLLMに拡張します。さらに、異なる粒度と方向性を持つアクションプールを通じて論理形式と内在的関係を定義する2つのMathAgentを提供し、実装します。MathAgent-MはそのアクションをLLMに適応させ、MathAgent-Hは人間に合わせます。miniF2FとMATHでの実験により、PRERと提案されたMathAgentの有効性が実証され、MiniF2Fでは12.3%（53.9%→66.2%）、MATHでは9.2%（49.8%→59.0%）、MATHのレベル5問題では13.2%（23.2%→35.4%）の向上を達成し、GPT-4に対して優位性を示しました。さらに、分析結果を通じて、LLMのエージェントとしての挙動を活用するためのより深い洞察を提供します。

English

Large language models (LLMs) face challenges in solving complex mathematical problems that require comprehensive capacities to parse the statements, associate domain knowledge, perform compound logical reasoning, and integrate the intermediate rationales. Tackling all these problems once could be arduous for LLMs, thus leading to confusion in generation. In this work, we explore the potential of enhancing LLMs with agents by meticulous decomposition and modeling of mathematical reasoning process. Specifically, we propose a formal description of the mathematical solving and extend LLMs with an agent-based zero-shot framework named Planner-Reasoner-Executor-Reflector (PRER). We further provide and implement two MathAgents that define the logical forms and inherent relations via a pool of actions in different grains and orientations: MathAgent-M adapts its actions to LLMs, while MathAgent-H aligns with humankind. Experiments on miniF2F and MATH have demonstrated the effectiveness of PRER and proposed MathAgents, achieving an increase of 12.3%(53.9%66.2%) on the MiniF2F, 9.2% (49.8%59.0%) on MATH, and 13.2%(23.2%35.4%) for level-5 problems of MATH against GPT-4. Further analytical results provide more insightful perspectives on exploiting the behaviors of LLMs as agents.

複雑な数学的推論のモデリング：大規模言語モデルベースのMathAgent

Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent

要旨

Support