エントロピーに基づく適応的重み付けによる自己学習

要旨

大規模言語モデルの数学的問題解決能力は、研究の焦点となっており、自己生成された推論パスを活用してこれらのモデルを洗練・強化する有望な方法として注目を集めています。これらのパスは、段階的な論理的プロセスを捉える一方で、正解のみを教師信号として必要とします。自己学習法は、外部モデルや手動アノテーションを必要とせずに推論タスクにおいて有効であることが示されています。しかし、モデル学習における自己生成データの最適な活用方法は未解決の課題です。本研究では、自己学習中の不確実なデータを優先する適応的ウェイト付け戦略であるEntropy-Based Adaptive Weighting for Self-Training (EAST)を提案します。具体的には、EASTは調整可能なパラメータを持つマッピング関数を使用し、モデルがより不確実性を示すデータに高い重みを割り当てます。このアプローチにより、モデルはより有益で挑戦的な例に焦点を当て、その推論能力を向上させます。我々は、GSM8KとMATHベンチマークでこのアプローチを評価しました。実験結果によると、標準的な方法ではMATHにおいてほとんど改善が見られない（0%）のに対し、EASTはバックボーンモデルに対して約1%の向上を達成しました。GSM8Kでは、EASTは標準的な方法と比較してさらに1-2%の性能向上を実現しました。

English

The mathematical problem-solving capabilities of large language models have become a focal point of research, with growing interests in leveraging self-generated reasoning paths as a promising way to refine and enhance these models. These paths capture step-by-step logical processes while requiring only the correct answer for supervision. The self-training method has been shown to be effective in reasoning tasks while eliminating the need for external models and manual annotations. However, optimizing the use of self-generated data for model training remains an open challenge. In this work, we propose Entropy-Based Adaptive Weighting for Self-Training (EAST), an adaptive weighting strategy designed to prioritize uncertain data during self-training. Specifically, EAST employs a mapping function with a tunable parameter that controls the sharpness of the weighting, assigning higher weights to data where the model exhibits greater uncertainty. This approach guides the model to focus on more informative and challenging examples, thereby enhancing its reasoning ability. We evaluate our approach on GSM8K and MATH benchmarks. Empirical results show that, while the vanilla method yields virtually no improvement (0%) on MATH, EAST achieves around a 1% gain over backbone model. On GSM8K, EAST attains a further 1-2% performance boost compared to the vanilla method.