將大型語言模型代理蒸餾至小型模型,並結合檢索與程式碼工具
Distilling LLM Agent into Small Models with Retrieval and Code Tools
May 23, 2025
作者: Minki Kang, Jongwon Jeong, Seanie Lee, Jaewoong Cho, Sung Ju Hwang
cs.AI
摘要
大型語言模型(LLMs)在複雜推理任務上表現卓越,但其計算成本高昂,限制了實際部署。為解決這一問題,近期研究聚焦於利用教師LLMs的思維鏈(CoT)軌跡,將推理能力蒸餾至更小的語言模型(sLMs)中。然而,在需要罕見事實知識或精確計算的場景中,此方法常因sLMs能力有限而產生幻覺。本研究中,我們提出了代理蒸餾框架,旨在不僅轉移推理能力,還將基於LLM的代理的完整任務解決行為轉移至配備檢索與代碼工具的sLMs中。我們從兩個互補方向改進了代理蒸餾:(1)引入了一種名為“首思前綴”的提示方法,以提升教師生成軌跡的質量;(2)提出了自洽動作生成,以增強小型代理在測試時的魯棒性。我們在涵蓋事實與數學領域的八項推理任務上評估了該方法,包括域內與域外泛化。結果顯示,僅含0.5B、1.5B、3B參數的sLMs,其性能可與使用CoT蒸餾微調的下一級更大模型(1.5B、3B、7B)相媲美,展示了代理蒸餾在構建實用、工具使用型小型代理方面的潛力。我們的代碼已公開於https://github.com/Nardien/agent-distillation。
English
Large language models (LLMs) excel at complex reasoning tasks but remain
computationally expensive, limiting their practical deployment. To address
this, recent works have focused on distilling reasoning capabilities into
smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher
LLMs. However, this approach struggles in scenarios requiring rare factual
knowledge or precise computation, where sLMs often hallucinate due to limited
capability. In this work, we propose Agent Distillation, a framework for
transferring not only reasoning capability but full task-solving behavior from
LLM-based agents into sLMs with retrieval and code tools. We improve agent
distillation along two complementary axes: (1) we introduce a prompting method
called first-thought prefix to enhance the quality of teacher-generated
trajectories; and (2) we propose a self-consistent action generation for
improving test-time robustness of small agents. We evaluate our method on eight
reasoning tasks across factual and mathematical domains, covering both
in-domain and out-of-domain generalization. Our results show that sLMs as small
as 0.5B, 1.5B, 3B parameters can achieve performance competitive with next-tier
larger 1.5B, 3B, 7B models fine-tuned using CoT distillation, demonstrating the
potential of agent distillation for building practical, tool-using small
agents. Our code is available at https://github.com/Nardien/agent-distillation.Summary
AI-Generated Summary