MITS: 点ごとの相互情報量による大規模言語モデルのための強化型ツリー探索推論

要旨

ツリーサーチは、大規模言語モデル（LLM）を用いた推論時の代表的なフレームワークとして確立され、複数の推論パスを探索する「Tree-of-Thought」や「モンテカルロ木探索」などの手法がその例として挙げられます。しかし、中間推論ステップの品質を即座かつ信頼性高く定量評価することは依然として難しく、広範なパス探索は計算コストが高いという課題があります。これに対処するため、我々は情報理論の原則に基づいて推論を導く新しいフレームワーク「相互情報量ツリーサーチ（MITS）」を提案します。MITSは、ポイントワイズ相互情報量（PMI）に基づく効果的なスコアリング関数を導入し、高コストな先読みシミュレーションを必要とせずにビームサーチによる推論パスのステップごとの評価と探索木の拡張を可能にします。これにより、計算効率を維持しつつ優れた推論性能を実現します。さらに、エントロピーに基づく動的サンプリング戦略を補完的に採用し、探索が最も有益な不確実な推論ステップに計算リソースを適応的に割り当てます。最終的な予測には、PMIスコアと予測の合意を組み合わせた加重投票スキームを採用します。多様な推論ベンチマークでの包括的な実験を通じて、MITSは一貫してベースライン手法を上回り、LLM推論のための原理的かつ効率的なフレームワークを確立しました。

English

Tree search has become as a representative framework for test-time reasoning with large language models (LLMs), exemplified by methods such as Tree-of-Thought and Monte Carlo Tree Search that explore multiple reasoning paths. However, it remains difficult to provide instant and reliable quantitative assessments of intermediate reasoning step quality, and extensive path exploration is computationally costly. To address this, we propose Mutual Information Tree Search (MITS), a novel framework that guides reasoning with information-theoretic principles. MITS introduces an effective scoring function based on pointwise mutual information (PMI), which enables step-wise evaluation of reasoning paths and search tree expansion via beam search without expensive look-ahead simulations, achieving superior reasoning performances while maintaining computational efficiency. The framework is complemented by an entropy-based dynamic sampling strategy that adaptively allocates computational resources to uncertain reasoning steps where exploration is most beneficial. For final prediction, MITS employs a weighted voting scheme that combines PMI scores with prediction consensus. Through comprehensive experiments on diverse reasoning benchmarks, MITS consistently surpasses baseline methods, establishing a principled and efficient framework for LLM reasoning.

MITS: 点ごとの相互情報量による大規模言語モデルのための強化型ツリー探索推論

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

要旨

Support