使用連鎖抽象推理的高效工具使用
Efficient Tool Use with Chain-of-Abstraction Reasoning
January 30, 2024
作者: Silin Gao, Jane Dwivedi-Yu, Ping Yu, Xiaoqing Ellen Tan, Ramakanth Pasunuru, Olga Golovneva, Koustuv Sinha, Asli Celikyilmaz, Antoine Bosselut, Tianlu Wang
cs.AI
摘要
為了實現符合人類期望的忠實推理,大型語言模型(LLMs)需要將推理基於現實世界的知識(例如網絡事實、數學和物理規則)。工具有助於LLMs訪問這些外部知識,但在對LLM代理(例如Toolformer)進行微調以調用工具進行多步推理問題時仍存在挑戰,其中相互連接的工具調用需要整體和高效的工具使用規劃。
在這項工作中,我們提出了一種新方法,用於使LLMs更好地利用工具進行多步推理。我們的方法,抽象鏈(CoA),訓練LLMs首先解碼帶有抽象占位符的推理鏈,然後調用領域工具通過填入具體知識來具體化每個推理鏈。這種帶有抽象鏈的規劃使LLMs能夠學習更一般的推理策略,這些策略對於與不同推理問題相關的領域知識變化(例如數學結果)具有韌性。它還允許LLMs在並行中執行外部工具的解碼和調用,從而避免了等待工具響應而導致的推理延遲。在數學推理和Wiki QA領域中,我們展示了我們的方法在分布內和分布外測試集上始終優於以往的思維鏈和工具增強基線,平均QA準確度提高了約6%。使用我們方法訓練的LLM代理還表現出更高效的工具使用,推理速度平均比基線工具增強的LLMs快了約1.4倍。
English
To achieve faithful reasoning that aligns with human expectations, large
language models (LLMs) need to ground their reasoning to real-world knowledge
(e.g., web facts, math and physical rules). Tools help LLMs access this
external knowledge, but there remains challenges for fine-tuning LLM agents
(e.g., Toolformer) to invoke tools in multi-step reasoning problems, where
inter-connected tool calls require holistic and efficient tool usage planning.
In this work, we propose a new method for LLMs to better leverage tools in
multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to
first decode reasoning chains with abstract placeholders, and then call domain
tools to reify each reasoning chain by filling in specific knowledge. This
planning with abstract chains enables LLMs to learn more general reasoning
strategies, which are robust to shifts of domain knowledge (e.g., math results)
relevant to different reasoning questions. It also allows LLMs to perform
decoding and calling of external tools in parallel, which avoids the inference
delay caused by waiting for tool responses. In mathematical reasoning and Wiki
QA domains, we show that our method consistently outperforms previous
chain-of-thought and tool-augmented baselines on both in-distribution and
out-of-distribution test sets, with an average ~6% absolute QA accuracy
improvement. LLM agents trained with our method also show more efficient tool
use, with inference speed being on average ~1.4x faster than baseline
tool-augmented LLMs.