使用連鎖抽象推理的高效工具使用

摘要

為了實現符合人類期望的忠實推理，大型語言模型（LLMs）需要將推理基於現實世界的知識（例如網絡事實、數學和物理規則）。工具有助於LLMs訪問這些外部知識，但在對LLM代理（例如Toolformer）進行微調以調用工具進行多步推理問題時仍存在挑戰，其中相互連接的工具調用需要整體和高效的工具使用規劃。在這項工作中，我們提出了一種新方法，用於使LLMs更好地利用工具進行多步推理。我們的方法，抽象鏈（CoA），訓練LLMs首先解碼帶有抽象占位符的推理鏈，然後調用領域工具通過填入具體知識來具體化每個推理鏈。這種帶有抽象鏈的規劃使LLMs能夠學習更一般的推理策略，這些策略對於與不同推理問題相關的領域知識變化（例如數學結果）具有韌性。它還允許LLMs在並行中執行外部工具的解碼和調用，從而避免了等待工具響應而導致的推理延遲。在數學推理和Wiki QA領域中，我們展示了我們的方法在分布內和分布外測試集上始終優於以往的思維鏈和工具增強基線，平均QA準確度提高了約6%。使用我們方法訓練的LLM代理還表現出更高效的工具使用，推理速度平均比基線工具增強的LLMs快了約1.4倍。

English

To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.

使用連鎖抽象推理的高效工具使用

Efficient Tool Use with Chain-of-Abstraction Reasoning

摘要

Support