검색 및 코드 도구를 활용한 대형 언어 모델 에이전트의 소형 모델로의 지식 증류

초록

대규모 언어 모델(LLM)은 복잡한 추론 작업에서 뛰어난 성능을 보이지만, 계산 비용이 높아 실제 배포에는 제약이 따릅니다. 이를 해결하기 위해 최근 연구들은 교사 LLM의 사고 사슬(CoT) 추적을 활용해 더 작은 언어 모델(sLM)로 추론 능력을 증류하는 데 초점을 맞추고 있습니다. 그러나 이 접근법은 희귀한 사실 지식이나 정확한 계산이 필요한 시나리오에서 한계를 보이며, sLM은 제한된 능력으로 인해 허구적인 결과를 생성하는 경우가 많습니다. 본 연구에서는 LLM 기반 에이전트의 추론 능력뿐만 아니라 전체 문제 해결 행동을 검색 및 코드 도구를 갖춘 sLM으로 전이하기 위한 에이전트 증류 프레임워크를 제안합니다. 우리는 에이전트 증류를 두 가지 상호 보완적인 측면에서 개선합니다: (1) 교사 생성 궤적의 품질을 향상시키기 위해 '첫 번째 생각 프리픽스'라는 프롬프팅 방법을 도입하고, (2) 작은 에이전트의 테스트 시 견고성을 개선하기 위해 자기 일관성 있는 행동 생성을 제안합니다. 우리는 사실 및 수학적 영역에 걸친 8가지 추론 작업에서 이 방법을 평가하며, 도메인 내 및 도메인 외 일반화를 모두 다룹니다. 실험 결과, 0.5B, 1.5B, 3B 파라미터 크기의 sLM이 CoT 증류를 통해 미세 조정된 1.5B, 3B, 7B 크기의 다음 단계 대형 모델과 경쟁력 있는 성능을 달성할 수 있음을 보여주며, 도구를 사용하는 실용적인 작은 에이전트 구축을 위한 에이전트 증류의 잠재력을 입증합니다. 우리의 코드는 https://github.com/Nardien/agent-distillation에서 확인할 수 있습니다.

English

Large language models (LLMs) excel at complex reasoning tasks but remain computationally expensive, limiting their practical deployment. To address this, recent works have focused on distilling reasoning capabilities into smaller language models (sLMs) using chain-of-thought (CoT) traces from teacher LLMs. However, this approach struggles in scenarios requiring rare factual knowledge or precise computation, where sLMs often hallucinate due to limited capability. In this work, we propose Agent Distillation, a framework for transferring not only reasoning capability but full task-solving behavior from LLM-based agents into sLMs with retrieval and code tools. We improve agent distillation along two complementary axes: (1) we introduce a prompting method called first-thought prefix to enhance the quality of teacher-generated trajectories; and (2) we propose a self-consistent action generation for improving test-time robustness of small agents. We evaluate our method on eight reasoning tasks across factual and mathematical domains, covering both in-domain and out-of-domain generalization. Our results show that sLMs as small as 0.5B, 1.5B, 3B parameters can achieve performance competitive with next-tier larger 1.5B, 3B, 7B models fine-tuned using CoT distillation, demonstrating the potential of agent distillation for building practical, tool-using small agents. Our code is available at https://github.com/Nardien/agent-distillation.

검색 및 코드 도구를 활용한 대형 언어 모델 에이전트의 소형 모델로의 지식 증류

Distilling LLM Agent into Small Models with Retrieval and Code Tools

초록

Support