効率的なツール使用のための抽象化連鎖推論

要旨

人間の期待に沿った忠実な推論を実現するためには、大規模言語モデル（LLM）が現実世界の知識（例：ウェブ上の事実、数学的・物理的規則）に基づいて推論を行う必要があります。ツールはLLMがこの外部知識にアクセスするのを助けますが、多段階推論問題においてツールを呼び出すためのLLMエージェント（例：Toolformer）の微調整には依然として課題が残っています。特に、相互に関連するツール呼び出しでは、包括的かつ効率的なツール使用計画が必要となります。本研究では、LLMが多段階推論においてツールをより効果的に活用するための新しい手法を提案します。私たちの手法である「抽象化の連鎖（Chain-of-Abstraction, CoA）」は、LLMにまず抽象的なプレースホルダーを含む推論連鎖をデコードさせ、その後、ドメイン固有のツールを呼び出して具体的な知識を埋めることで各推論連鎖を具体化するように訓練します。この抽象化された連鎖を用いた計画により、LLMはより一般的な推論戦略を学習することができ、異なる推論問題に関連するドメイン知識（例：数学的結果）の変化に対して頑健です。また、LLMが外部ツールのデコードと呼び出しを並列に行うことを可能にし、ツールの応答を待つことによる推論の遅延を回避します。数学的推論およびWiki QAドメインにおいて、私たちの手法は、分布内および分布外のテストセットにおいて、従来の連鎖的思考（chain-of-thought）やツール拡張ベースラインを一貫して上回り、平均で約6%の絶対的なQA精度向上を示しました。私たちの手法で訓練されたLLMエージェントは、ツールの使用がより効率的であり、推論速度がベースラインのツール拡張LLMと比べて平均で約1.4倍高速でした。

English

To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.

効率的なツール使用のための抽象化連鎖推論

Efficient Tool Use with Chain-of-Abstraction Reasoning

要旨

Support