利用检索与代码工具将大语言模型智能体蒸馏至小型模型

摘要

大型语言模型（LLMs）在复杂推理任务上表现出色，但其高昂的计算成本限制了实际应用。为解决这一问题，近期研究致力于通过教师LLMs的思维链（CoT）轨迹，将推理能力蒸馏至更小的语言模型（sLMs）。然而，在需要罕见事实知识或精确计算的场景中，sLMs常因能力有限而产生幻觉。为此，我们提出了代理蒸馏框架，旨在将基于LLM的代理的完整任务解决行为，包括推理能力，转移至配备检索和代码工具的sLMs中。我们从两个互补方向改进代理蒸馏：（1）引入“首思前缀”提示法，提升教师生成轨迹的质量；（2）提出自洽动作生成，增强小代理在测试时的鲁棒性。我们在涵盖事实与数学领域的八项推理任务上评估了该方法，包括域内与域外泛化。结果显示，参数规模仅为0.5B、1.5B、3B的sLMs，其性能可与使用CoT蒸馏微调的更大规模1.5B、3B、7B模型相媲美，证明了代理蒸馏在构建实用、工具使用型小代理方面的潜力。代码已发布于https://github.com/Nardien/agent-distillation。

English

Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher LLMs. However, this approach struggles in scenarios requiring rare factual knowledge or precise computation, where sLMs often hallucinate due to limited capability. In this work, we propose Agent Distillation, a framework for transferring not only reasoning capability but full task-solving behavior from LLM-based agents into sLMs with retrieval and code tools. We improve agent distillation along two complementary axes: (1) we introduce a prompting method called first-thought prefix to enhance the quality of teacher-generated trajectories; and (2) we propose a self-consistent action generation for improving test-time robustness of small agents. We evaluate our method on eight reasoning tasks across factual and mathematical domains, covering both in-domain and out-of-domain generalization. Our results show that sLMs as small as 0.5B, 1.5B, 3B parameters can achieve performance competitive with next-tier larger 1.5B, 3B, 7B models fine-tuned using CoT distillation, demonstrating the potential of agent distillation for building practical, tool-using small agents. Our code is available at https://github.com/Nardien/agent-distillation.

利用检索与代码工具将大语言模型智能体蒸馏至小型模型

Distilling LLM Agent into Small Models with Retrieval and Code Tools

摘要

Support