LEMMA: LLMの数学的進歩のための誤りからの学習

要旨

大規模言語モデル（LLM）は、数学的問題を解決する際に顕著な推論能力を示してきた。しかし、既存のアプローチは主に、正しいトレーニングデータの品質向上に焦点を当てており、例えば、高度なモデルから高品質な正解を蒸留するなど、誤りデータに含まれる価値を軽視している。これにより、モデルの反省能力が妨げられる可能性がある。一部の研究では誤りデータを活用しようと試みているが、モンテカルロ木探索（MCTS）を用いて誤りノードを探索するなど、複雑なメカニズムを伴うことが多い。本研究では、誤りから学ぶことで数学的進歩を促す「LEMMA（Learning from Errors for Mathematical Advancement）」を提案し、LLMの推論能力を向上させる。LEMMAは、誤ったステップを含む不正解と、正解への反省的接続からなるデータを構築し、ファインチューニングに利用する。具体的には、モデルが生成する誤りのタイプを体系的に分析し、誤りタイプに基づいたミス拡張法を導入して、多様で代表的な誤りを収集する。正解は、誤りを修正するか、新たに生成することで得られる。モデルを意識した滑らかな反省的接続を通じて、誤った解法が正しい解法へと転換される。構築されたデータセットでファインチューニングを行うことで、モデルは外部の批評モデルに依存することなく、生成プロセス内で自律的に誤りを自己修正できるようになる。実験結果は、LEMMAが他の強力なベースラインを大幅に上回る性能向上を達成することを示している。

English

Large language models (LLMs) have demonstrated remarkable reasoning capability in solving mathematical problems. However, existing approaches primarily focus on improving the quality of correct training data, e.g., distilling high-quality correct solutions from advanced models, neglecting the value contained in error data, potentially hindering the model's reflective ability. Though some studies attempt to leverage error data, they often involve complex mechanisms, such as Monte Carlo Tree Search (MCTS) to explore error nodes. In this work, we propose to enhance LLMs' reasoning ability by Learning from Errors for Mathematical Advancement (LEMMA). LEMMA constructs data consisting of an incorrect solution with an erroneous step and a reflection connection to a correct solution for fine-tuning. Specifically, we systematically analyze the model-generated error types and introduce an error-type grounded mistake augmentation method to collect diverse and representative errors. Correct solutions are either from fixing the errors or generating a fresh start. Through a model-aware smooth reflection connection, the erroneous solution is transferred to the correct one. By fine-tuning on the constructed dataset, the model is able to self-correct errors autonomously within the generation process without relying on external critique models. Experimental results demonstrate that LEMMA achieves significant performance improvements over other strong baselines.

LEMMA: LLMの数学的進歩のための誤りからの学習

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

要旨

Support