DOTS:通過最佳推理軌跡搜索在LLM中學習動態推理
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search
October 4, 2024
作者: Murong Yue, Wenlin Yao, Haitao Mi, Dian Yu, Ziyu Yao, Dong Yu
cs.AI
摘要
近年來,增強大型語言模型(LLMs)在推理方面的能力已經引起了顯著的關注。先前的研究已經證明了各種提示策略在幫助LLMs進行推理(稱為“推理行為”)方面的有效性,例如逐步思考、在回答之前反思、使用程序解決問題以及它們的組合。然而,這些方法通常將靜態的、預定義的推理行為均勻應用於所有問題,而沒有考慮到每個問題的具體特徵或任務解決LLM的能力。在本文中,我們提出了一種名為DOTS的方法,通過尋找最佳推理軌跡,以適應每個問題的具體特徵和任務解決LLM的固有能力,使LLMs能夠動態進行推理。我們的方法包括三個關鍵步驟:i)定義可以組成各種推理行為軌跡的原子推理行動模塊;ii)通過迭代探索和評估來為每個訓練問題尋找最佳行動軌跡,以適應特定任務解決LLM;iii)使用收集到的最佳軌跡來訓練LLM計劃未見問題的推理軌跡。特別是,我們提出了兩種學習範式,即對外部LLM進行微調作為引導任務解決LLM的計劃者,或者直接對具有內部化推理行動計劃能力的任務解決LLM進行微調。我們在八個推理任務上的實驗表明,我們的方法始終優於靜態推理技術和基本指令調整方法。進一步的分析顯示,我們的方法使LLMs能夠根據問題的複雜性調整其計算,將更深入的思考和推理分配給更難的問題。
English
Enhancing the capability of large language models (LLMs) in reasoning has
gained significant attention in recent years. Previous studies have
demonstrated the effectiveness of various prompting strategies in aiding LLMs
in reasoning (called "reasoning actions"), such as step-by-step thinking,
reflecting before answering, solving with programs, and their combinations.
However, these approaches often applied static, predefined reasoning actions
uniformly to all questions, without considering the specific characteristics of
each question or the capability of the task-solving LLM. In this paper, we
propose DOTS, an approach enabling LLMs to reason dynamically via optimal
reasoning trajectory search, tailored to the specific characteristics of each
question and the inherent capability of the task-solving LLM. Our approach
involves three key steps: i) defining atomic reasoning action modules that can
be composed into various reasoning action trajectories; ii) searching for the
optimal action trajectory for each training question through iterative
exploration and evaluation for the specific task-solving LLM; and iii) using
the collected optimal trajectories to train an LLM to plan for the reasoning
trajectories of unseen questions. In particular, we propose two learning
paradigms, i.e., fine-tuning an external LLM as a planner to guide the
task-solving LLM, or directly fine-tuning the task-solving LLM with an
internalized capability for reasoning actions planning. Our experiments across
eight reasoning tasks show that our method consistently outperforms static
reasoning techniques and the vanilla instruction tuning approach. Further
analysis reveals that our method enables LLMs to adjust their computation based
on problem complexity, allocating deeper thinking and reasoning to harder
problems.Summary
AI-Generated Summary