Trace2Skill: 궤적-지역 학습을 전이 가능한 에이전트 기술로 정제하기

초록

대규모 언어 모델(LLM) 에이전트에 도메인 특화 기술을 부여하는 것은 복잡한 작업을 해결하는 데 중요합니다. 그러나 수동 작성은 확장성에 심각한 병목 현상을 초래합니다. 반대로, 자동화된 기술 생성은 피상적인 매개변수 지식에 의존하거나 일반화할 수 없는 궤적-국소적 교훈에 순차적으로 과적합하는 경향이 있어 취약하거나 파편화된 결과를 내놓곤 합니다. 이를 극복하기 위해 우리는 인간 전문가가 기술을 작성하는 방식을 모방한 Trace2Skill 프레임워크를 소개합니다: 즉, 광범위한 실행 경험을 전체적으로 분석한 후 이를 단일의 포괄적인 가이드로 정제하는 방식입니다. Trace2Skill은 개별 궤적에 순차적으로 반응하는 대신, 병렬 하위 에이전트 그룹을 활용하여 다양한 실행 사례 풀을 분석합니다. 이를 통해 궤적별 교훈을 추출하고 귀납적 추론을 통해 계층적으로 통합하여 통일되고 충돌이 없는 기술 디렉토리를 생성합니다. Trace2Skill은 기존에 인간이 작성한 기술을 심화시키는 것과 새로운 기술을 처음부터 생성하는 것을 모두 지원합니다. 스프레드시트, VisionQA, 수학 추론과 같은 도전적인 도메인에서의 실험 결과, Trace2Skill이 Anthropic의 공식 xlsx 기술을 포함한 강력한 베이스라인을 크게 능가함을 보여줍니다. 중요한 것은, 이 궤적 기반 진화가 단순히 작업 인스턴스나 모델 특유의 편향을 암기하는 것이 아니라는 점입니다. 진화된 기술은 LLM 규모를 넘어 전이되며 분포 외(OOD) 설정에서도 일반화됩니다. 예를 들어, Qwen3.5-35B가 자체 궤적에서 진화시킨 기술은 Qwen3.5-122B 에이전트의 WikiTableQuestions 성능을 최대 57.65%p 절대 점수로 향상시켰습니다. 궁극적으로, 우리의 결과는 복잡한 에이전트 경험이 매개변수 업데이트나 외부 검색 모듈이 필요 없으며 35B 매개변수 규모의 오픈소스 모델만으로도 구현 가능한, 매우 전이성이 높은 선언적 기술로 패키징될 수 있음을 입증합니다.

English

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.

Trace2Skill: 궤적-지역 학습을 전이 가능한 에이전트 기술로 정제하기

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

초록

Support