ChatPaper.aiChatPaper

退一步,為了提升語言模型的推理能力而自我回溯:自回溯技術。

Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models

February 6, 2025
作者: Xiao-Wen Yang, Xuan-Yi Zhu, Wen-Da Wei, Ding-Chu Zhang, Jie-Jing Shao, Zhi Zhou, Lan-Zhe Guo, Yu-Feng Li
cs.AI

摘要

將慢思考機制整合到大型語言模型(LLMs)中,為實現第2級AGI推理者提供了一個有前途的途徑,正如OpenAI的o1等系統所展示的那樣。然而,仍然存在一些重大挑戰,包括低效的過度思考和對輔助獎勵模型的過度依賴。我們指出,這些限制源於LLMs無法內化搜索過程,這是有效推理的關鍵組成部分。解決這個問題的關鍵一步是使LLMs能夠自主確定何時何地進行回溯,這是傳統搜索算法中的一個基本操作。為此,我們提出了一種自回溯機制,使LLMs能夠在訓練和推理過程中進行回溯。這種機制不僅增強了推理能力,還通過自我改進將慢思考過程轉變為快思考,從而提高了效率。實證評估表明,我們的提議顯著增強了LLMs的推理能力,與最佳路徑監督微調方法相比,性能提高了超過40%。我們相信這項研究為開發更先進和更強大的推理者開辟了一條新穎且有前途的途徑。
English
The integration of slow-thinking mechanisms into large language models (LLMs) offers a promising way toward achieving Level 2 AGI Reasoners, as exemplified by systems like OpenAI's o1. However, several significant challenges remain, including inefficient overthinking and an overreliance on auxiliary reward models. We point out that these limitations stem from LLMs' inability to internalize the search process, a key component of effective reasoning. A critical step toward addressing this issue is enabling LLMs to autonomously determine when and where to backtrack, a fundamental operation in traditional search algorithms. To this end, we propose a self-backtracking mechanism that equips LLMs with the ability to backtrack during both training and inference. This mechanism not only enhances reasoning ability but also efficiency by transforming slow-thinking processes into fast-thinking through self-improvement. Empirical evaluations demonstrate that our proposal significantly enhances the reasoning capabilities of LLMs, achieving a performance gain of over 40 percent compared to the optimal-path supervised fine-tuning method. We believe this study introduces a novel and promising pathway for developing more advanced and robust Reasoners.

Summary

AI-Generated Summary

PDF242February 10, 2025