스킬 프로그램을 통한 LLM 에이전트 활용

초록

LLM 에이전트에 과거 경험에서 얻은 재사용 가능한 스킬을 장착하는 것은 복잡하고 장기적인 작업을 해결하기 위한 인기 있고 성공적인 접근 방식이 되었다. 그러나 이러한 교훈은 종종 텍스트 지침으로 인코딩되어 대체로 조언적인 수준에 머물며, 에이전트 루프에 언제, 어떻게 개입할지에 대한 명시적 메커니즘이 부족하다. 이러한 격차를 해소하기 위해, 본 연구는 스킬을 실행 가능한 프로그램 함수(PF)로 업그레이드하는 새로운 프레임워크인 HASP(Harnessing LLM Agents with Skill Programs)를 소개한다. PF는 수동적인 조언을 제공하는 대신, 실패 가능성이 높은 상태에서 활성화되어 다음 행동을 수정하거나 교정 맥락을 주입하는 실행 가능한 가드레일 역할을 한다. HASP는 고도로 모듈식으로, 직접적인 에이전트 루프 개입을 위한 추론 시간, 구조화된 감독을 제공하기 위한 사후 훈련, 또는 검증되고 교사가 검토한 PF를 진화시켜 자기 개선을 위해 적용될 수 있다. 실증적으로 HASP는 웹 검색, 수학 추론 및 코딩 작업에서 훈련 없이 및 훈련 기반 방법 모두에 비해 상당한 성능 향상을 이끌어낸다. 예를 들어, 웹 검색 추론에서 추론 시간 PF만으로도 (다중 루프) ReAct 에이전트 대비 평균 성능이 25% 향상되었으며, 사후 훈련 및 통제된 진화는 Search-R1 대비 30.4%의 향상을 달성했다. HASP에 대한 더 깊은 통찰을 제공하기 위해, 메커니즘 분석을 통해 PF가 어떻게 트리거되고 개입하는지, 스킬이 어떻게 내재화되는지, 그리고 안정적인 스킬 라이브러리 진화의 필요성을 밝힌다.

English

Equipping LLM agents with reusable skills derived from past experience has become a popular and successful approach for tackling complex and long-horizon tasks. However, such lessons are often encoded as textual guidance that remains largely advisory, lacking explicit mechanisms for when and how to intervene in the agent loop. To bridge the gap, we introduce HASP(Harnessing LLM Agents with Skill Programs), a new framework that upgrades skills into executable Program Functions (PFs). Rather than offering passive advice, PFs act as executable guardrails that activate on failure-prone states and modify the next action or inject corrective context. HASP is highly modular: it can be applied at inference time for direct agent-loop intervention, during post-training to provide structured supervision, or for self-improvement by evolving validated, teacher-reviewed PFs. Empirically, HASP drives substantial gains compared to both training-free and training-based methods on web-search, math reasoning, and coding tasks. For example, on web-search reasoning, inference-time PFs alone improve the average performance by 25% compared to (multi-loop) ReAct Agent, while post-training and controlled evolution achieve a 30.4% gain over Search-R1. To provide deeper insights into HASP, our mechanism analysis reveals how PFs trigger and intervene, how skills are internalized, and the requirement for stable skill library evolution.