大規模言語モデルエージェントを検索とコードツールを用いて小型モデルに蒸留する

要旨

大規模言語モデル（LLM）は複雑な推論タスクに優れているが、計算コストが高く、実用的な展開が制限されている。この問題に対処するため、最近の研究では、教師LLMからの連鎖思考（CoT）トレースを用いて、推論能力をより小規模な言語モデル（sLM）に蒸留することに焦点が当てられている。しかし、このアプローチは、稀な事実知識や精密な計算を必要とするシナリオでは、sLMが能力の限界から虚構を生成するため、苦戦する。本研究では、推論能力だけでなく、LLMベースのエージェントから検索およびコードツールを備えたsLMへ、タスク解決行動全体を転送するためのフレームワークである「エージェント蒸留」を提案する。我々は、エージェント蒸留を2つの補完的な軸に沿って改善する：（1）教師生成軌跡の品質を向上させるための「初めの思考プレフィックス」と呼ばれるプロンプト手法を導入し、（2）小規模エージェントのテスト時のロバスト性を向上させるための自己一貫性のある行動生成を提案する。我々の手法を、事実および数学領域にわたる8つの推論タスクで評価し、ドメイン内およびドメイン外の一般化をカバーする。結果として、0.5B、1.5B、3BパラメータのsLMが、CoT蒸留を用いてファインチューニングされた次の階層の1.5B、3B、7Bモデルと競争力のある性能を達成できることを示し、実用的なツール使用型小規模エージェントを構築するためのエージェント蒸留の可能性を実証する。我々のコードはhttps://github.com/Nardien/agent-distillationで公開されている。

English

Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher LLMs. However, this approach struggles in scenarios requiring rare factual knowledge or precise computation, where sLMs often hallucinate due to limited capability. In this work, we propose Agent Distillation, a framework for transferring not only reasoning capability but full task-solving behavior from LLM-based agents into sLMs with retrieval and code tools. We improve agent distillation along two complementary axes: (1) we introduce a prompting method called first-thought prefix to enhance the quality of teacher-generated trajectories; and (2) we propose a self-consistent action generation for improving test-time robustness of small agents. We evaluate our method on eight reasoning tasks across factual and mathematical domains, covering both in-domain and out-of-domain generalization. Our results show that sLMs as small as 0.5B, 1.5B, 3B parameters can achieve performance competitive with next-tier larger 1.5B, 3B, 7B models fine-tuned using CoT distillation, demonstrating the potential of agent distillation for building practical, tool-using small agents. Our code is available at https://github.com/Nardien/agent-distillation.

大規模言語モデルエージェントを検索とコードツールを用いて小型モデルに蒸留する

Distilling LLM Agent into Small Models with Retrieval and Code Tools

要旨

Support