상호 추론은 작은 LLMs를 더 강력한 문제 해결자로 만든다.

초록

본 논문은 rStar를 소개하는데, 이는 소량 언어 모델(SLMs)의 추론 능력을 현저히 향상시키는 자가 대결 상호 추론 접근법입니다. rStar는 미세 조정이나 우수한 모델 없이 추론을 크게 향상시킵니다. rStar는 추론을 자가 대결 생성-식별 과정으로 분리합니다. 먼저 대상 SLM은 Monte Carlo Tree Search (MCTS)에 인간과 유사한 다양한 추론 작업을 추가하여 더 높은 품질의 추론 경로를 구축합니다. 그 다음, 대상 SLM에 의해 생성된 각 경로를 확인하는 판별자 역할을 하는 또 다른 SLM이 있습니다. 상호 합의된 추론 경로는 상호 일관성이 있다고 간주되므로 올바를 가능성이 높습니다. GSM8K, GSM-Hard, MATH, SVAMP 및 StrategyQA를 포함한 다양한 추론 문제를 효과적으로 해결할 수 있는 것을 보여주는 다섯 개의 SLM을 대상으로 한 광범위한 실험을 통해 rStar가 얼마나 효과적인지 입증합니다. 놀랍게도, rStar는 LLaMA2-7B의 GSM8K 정확도를 12.51%에서 63.91%로, Mistral-7B의 정확도를 36.46%에서 81.88%로, LLaMA3-8B-Instruct의 정확도를 74.53%에서 91.13%로 향상시킵니다. 코드는 https://github.com/zhentingqi/rStar에서 제공될 예정입니다.

English

This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy from 12.51% to 63.91% for LLaMA2-7B, from 36.46% to 81.88% for Mistral-7B, from 74.53% to 91.13% for LLaMA3-8B-Instruct. Code will be available at https://github.com/zhentingqi/rStar.

상호 추론은 작은 LLMs를 더 강력한 문제 해결자로 만든다.

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

초록

Support