MITS: Verbeterde Boomzoekredenering voor LLM's via Puntmutuele Informatie

Samenvatting

Boomzoeken is uitgegroeid tot een representatief raamwerk voor redeneren tijdens testtijd met grote taalmodellen (LLMs), geïllustreerd door methoden zoals Tree-of-Thought en Monte Carlo Tree Search die meerdere redeneerpaden verkennen. Het blijft echter moeilijk om directe en betrouwbare kwantitatieve beoordelingen te geven van de kwaliteit van tussenliggende redeneerstappen, en uitgebreide padverkenning is rekenkundig kostbaar. Om dit aan te pakken, stellen we Mutual Information Tree Search (MITS) voor, een nieuw raamwerk dat redeneren begeleidt met informatie-theoretische principes. MITS introduceert een effectieve scoringsfunctie gebaseerd op pointwise mutual information (PMI), die stapsgewijze evaluatie van redeneerpaden en uitbreiding van de zoekboom via beam search mogelijk maakt zonder dure vooruitbliksimulaties, wat superieure redeneerprestaties oplevert terwijl de rekenkundige efficiëntie behouden blijft. Het raamwerk wordt aangevuld met een op entropie gebaseerde dynamische steekproefstrategie die rekenkundige bronnen adaptief toewijst aan onzekere redeneerstappen waar verkenning het meest voordelig is. Voor de uiteindelijke voorspelling gebruikt MITS een gewogen stemschema dat PMI-scores combineert met voorspellingsconsensus. Door uitgebreide experimenten op diverse redeneerbenchmarks overtreft MITS consistent basislijnmethoden, waarmee een principieel en efficiënt raamwerk voor LLM-redeneren wordt gevestigd.

English

Tree search has become as a representative framework for test-time reasoning with large language models (LLMs), exemplified by methods such as Tree-of-Thought and Monte Carlo Tree Search that explore multiple reasoning paths. However, it remains difficult to provide instant and reliable quantitative assessments of intermediate reasoning step quality, and extensive path exploration is computationally costly. To address this, we propose Mutual Information Tree Search (MITS), a novel framework that guides reasoning with information-theoretic principles. MITS introduces an effective scoring function based on pointwise mutual information (PMI), which enables step-wise evaluation of reasoning paths and search tree expansion via beam search without expensive look-ahead simulations, achieving superior reasoning performances while maintaining computational efficiency. The framework is complemented by an entropy-based dynamic sampling strategy that adaptively allocates computational resources to uncertain reasoning steps where exploration is most beneficial. For final prediction, MITS employs a weighted voting scheme that combines PMI scores with prediction consensus. Through comprehensive experiments on diverse reasoning benchmarks, MITS consistently surpasses baseline methods, establishing a principled and efficient framework for LLM reasoning.

MITS: Verbeterde Boomzoekredenering voor LLM's via Puntmutuele Informatie

MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Samenvatting

Support