通过抽象链推理实现高效工具使用

摘要

为了实现与人类期望一致的忠实推理，大型语言模型（LLMs）需要将推理基于现实世界知识（例如网络事实、数学和物理规则）。工具帮助LLMs访问这些外部知识，但在微调LLM代理（例如Toolformer）以调用工具解决多步推理问题时仍存在挑战，其中相互连接的工具调用需要整体和高效的工具使用规划。在这项工作中，我们提出了一种新的方法，用于让LLMs更好地利用工具进行多步推理。我们的方法，抽象链（CoA），训练LLMs首先解码带有抽象占位符的推理链，然后调用领域工具通过填充具体知识来实现每个推理链。这种带有抽象链的规划使LLMs能够学习更一般的推理策略，对不同推理问题相关的领域知识转变（例如数学结果）具有鲁棒性。它还允许LLMs并行执行外部工具的解码和调用，避免等待工具响应引起的推理延迟。在数学推理和维基问答领域，我们展示了我们的方法在分布内外测试集上始终优于以往的思维链和工具增强基线，平均QA准确率提高约6%。使用我们方法训练的LLM代理还表现出更高效的工具使用，推理速度平均比基线工具增强LLMs快约1.4倍。

English

To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.

通过抽象链推理实现高效工具使用

Efficient Tool Use with Chain-of-Abstraction Reasoning

摘要

Support