透過蒙特卡羅樹自我精煉與LLaMa-3 8B，存取GPT-4級數學奧林匹亞解答。

摘要

本文介紹了MCT Self-Refine（MCTSr）演算法，這是一種創新的大型語言模型（LLMs）與蒙特卡羅樹搜索（MCTS）相結合的方法，旨在增強在複雜數學推理任務中的性能。為了應對LLMs在策略和數學推理方面準確性和可靠性方面的挑戰，MCTSr利用系統性探索和啟發式自我完善機制來改進LLMs內的決策框架。該演算法通過選擇、自我完善、自我評估和反向傳播的迭代過程構建蒙特卡羅搜索樹，利用改進的上信心邊界（UCB）公式來優化探索和利用之間的平衡。大量實驗證明MCTSr在解決奧林匹亞數學問題方面的有效性，顯著提高了跨多個數據集的成功率，包括GSM8K、GSM Hard、MATH以及奧林匹亞級基準測試，如Math Odyssey、AIME和OlympiadBench。該研究推動了LLMs在複雜推理任務中的應用，為未來AI整合奠定了基礎，提高了LLMs驅動應用中的決策準確性和可靠性。

English

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

透過蒙特卡羅樹自我精煉與LLaMa-3 8B，存取GPT-4級數學奧林匹亞解答。

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

摘要

Support