OpenAI的o1模型推理模式的比较研究

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

October 17, 2024

作者: Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu

cs.AI

摘要

让大型语言模型（LLMs）能够处理更广泛的复杂任务（例如编码、数学）引起了许多研究人员的极大关注。随着LLMs的不断发展，仅仅增加模型参数数量会带来性能改进的递减效果和沉重的计算成本。最近，OpenAI的o1模型表明推理策略（即测试时计算方法）也能显著增强LLMs的推理能力。然而，这些方法背后的机制仍未被探索。在我们的研究中，为了调查o1的推理模式，我们使用OpenAI的GPT-4o作为基础，在三个领域（即数学、编码、常识推理）的通用推理基准上，将o1与现有的测试时计算方法（BoN、逐步BoN、代理工作流和自我完善）进行了比较。具体来说，首先，我们的实验表明o1模型在大多数数据集上取得了最佳性能。其次，对于寻找多样化响应的方法（例如BoN），我们发现奖励模型的能力和搜索空间都限制了这些方法的上限。第三，对于将问题分解为许多子问题的方法，代理工作流由于领域特定的系统提示而实现了比逐步BoN更好的性能，以规划更好的推理过程。第四，值得一提的是，我们总结了o1的六种推理模式，并对几个推理基准进行了详细分析。

English

Enabling Large Language Models (LLMs) to handle a wider range of complex tasks (e.g., coding, math) has drawn great attention from many researchers. As LLMs continue to evolve, merely increasing the number of model parameters yields diminishing performance improvements and heavy computational costs. Recently, OpenAI's o1 model has shown that inference strategies (i.e., Test-time Compute methods) can also significantly enhance the reasoning capabilities of LLMs. However, the mechanisms behind these methods are still unexplored. In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i.e., math, coding, commonsense reasoning). Specifically, first, our experiments show that the o1 model has achieved the best performance on most datasets. Second, as for the methods of searching diverse responses (e.g., BoN), we find the reward models' capability and the search space both limit the upper boundary of these methods. Third, as for the methods that break the problem into many sub-problems, the Agent Workflow has achieved better performance than Step-wise BoN due to the domain-specific system prompt for planning better reasoning processes. Fourth, it is worth mentioning that we have summarized six reasoning patterns of o1, and provided a detailed analysis on several reasoning benchmarks.