LLaMa-3 8B를 활용한 몬테카를로 트리 자기-개선을 통한 GPT-4 수준의 수학 올림피아드 해법 접근

초록

본 논문은 복잡한 수학적 추론 과제에서의 성능 향상을 위해 설계된, 대규모 언어 모델(LLM)과 몬테카를로 트리 탐색(MCTS)을 혁신적으로 통합한 MCT Self-Refine(MCTSr) 알고리즘을 소개한다. 특히 전략적 및 수학적 추론에서의 정확성과 신뢰성 문제를 해결하기 위해, MCTSr는 체계적인 탐색과 휴리스틱 자기 개선 메커니즘을 활용하여 LLM 내의 의사결정 프레임워크를 개선한다. 이 알고리즘은 선택, 자기 개선, 자기 평가, 역전파의 반복적 과정을 통해 몬테카를로 탐색 트리를 구축하며, 개선된 상한 신뢰 구간(UCB) 공식을 사용하여 탐색과 활용 간의 균형을 최적화한다. 광범위한 실험을 통해 MCTSr는 GSM8K, GSM Hard, MATH 및 Math Odyssey, AIME, OlympiadBench와 같은 올림피아드 수준 벤치마크를 포함한 여러 데이터셋에서 올림피아드 수준의 수학 문제를 해결하는 데 있어 뛰어난 효율성을 보여주며, 성공률을 크게 향상시킨다. 이 연구는 복잡한 추론 과제에서의 LLM 적용을 발전시키고, LLM 기반 애플리케이션에서의 의사결정 정확성과 신뢰성을 향상시켜 미래의 AI 통합을 위한 기반을 마련한다.

English

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

LLaMa-3 8B를 활용한 몬테카를로 트리 자기-개선을 통한 GPT-4 수준의 수학 올림피아드 해법 접근

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

초록

Support