通过蒙特卡洛树自我完善和LLaMa-3 8B，获取GPT-4级数学奥林匹克解决方案。

摘要

本文介绍了MCT自我优化（MCTSr）算法，这是大型语言模型（LLMs）与蒙特卡洛树搜索（MCTS）创新集成的成果，旨在提高复杂数学推理任务的性能。MCTSr解决了LLMs在战略和数学推理中准确性和可靠性方面的挑战，利用系统化探索和启发式自我优化机制改进了LLMs内的决策框架。该算法通过选择、自我优化、自我评估和反向传播的迭代过程构建蒙特卡洛搜索树，利用改进的上置信界限（UCB）公式优化探索与开发之间的平衡。大量实验表明MCTSr在解决奥林匹克级数学问题方面的有效性，显著提高了多个数据集（包括GSM8K、GSM Hard、MATH以及奥林匹克级基准数据集如Math Odyssey、AIME和OlympiadBench）的成功率。该研究推动了LLMs在复杂推理任务中的应用，并为未来AI集成奠定了基础，提高了LLMs驱动应用中的决策准确性和可靠性。

English

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

通过蒙特卡洛树自我完善和LLaMa-3 8B，获取GPT-4级数学奥林匹克解决方案。

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

摘要

Support