MITS：基於點間互信息增強的大型語言模型樹搜索推理

摘要

樹狀搜索已成為大型語言模型（LLMs）在測試時進行推理的代表性框架，例如「思維樹」和蒙特卡羅樹搜索等方法，這些方法探索了多條推理路徑。然而，提供即時且可靠的中間推理步驟質量定量評估仍然困難，且廣泛的路徑探索在計算上成本高昂。為解決這一問題，我們提出了基於互信息的樹狀搜索（MITS），這是一個利用信息論原理指導推理的新框架。MITS引入了一種基於點互信息（PMI）的有效評分函數，該函數能夠實現推理路徑的逐步評估，並通過束搜索擴展搜索樹，無需昂貴的前瞻模擬，在保持計算效率的同時實現了卓越的推理性能。該框架還配備了一種基於熵的動態採樣策略，能夠自適應地將計算資源分配給最有益探索的不確定推理步驟。對於最終預測，MITS採用了一種加權投票方案，將PMI分數與預測共識相結合。通過在多樣化的推理基準上進行全面實驗，MITS始終超越基準方法，為LLM推理建立了一個原則性且高效的框架。

English

Tree search has become as a representative framework for test-time reasoning with large language models (LLMs), exemplified by methods such as Tree-of-Thought and Monte Carlo Tree Search that explore multiple reasoning paths. However, it remains difficult to provide instant and reliable quantitative assessments of intermediate reasoning step quality, and extensive path exploration is computationally costly. To address this, we propose Mutual Information Tree Search (MITS), a novel framework that guides reasoning with information-theoretic principles. MITS introduces an effective scoring function based on pointwise mutual information (PMI), which enables step-wise evaluation of reasoning paths and search tree expansion via beam search without expensive look-ahead simulations, achieving superior reasoning performances while maintaining computational efficiency. The framework is complemented by an entropy-based dynamic sampling strategy that adaptively allocates computational resources to uncertain reasoning steps where exploration is most beneficial. For final prediction, MITS employs a weighted voting scheme that combines PMI scores with prediction consensus. Through comprehensive experiments on diverse reasoning benchmarks, MITS consistently surpasses baseline methods, establishing a principled and efficient framework for LLM reasoning.

MITS：基於點間互信息增強的大型語言模型樹搜索推理

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

摘要

Support