SkillsVote: 에이전트 스킬의 수집부터 추천, 진화까지의 생명주기 거버넌스

초록

장기적 LLM 에이전트는 재사용 가능한 경험으로 전환될 수 있는 흔적을 남기지만, 원시 궤적은 잡음이 많고 제어하기 어렵다. 우리는 에이전트 스킬(Agent Skills)을 실행 가능한 스크립트와 절차에 대한 비실행 가능한 지침을 결합한 경험 스키마로 간주한다. 그러나 개방형 스킬 생태계에는 중복되고, 불균일하며, 환경에 민감한 산출물이 포함되어 있으며, 무분별한 업데이트는 향후 맥락을 오염시킬 수 있다. 우리는 에이전트 스킬의 수집, 추천, 진화에 이르는 수명 주기 거버넌스 프레임워크인 SkillsVote를 제시한다. SkillsVote는 백만 규모의 오픈소스 코퍼스에서 환경 요구사항, 품질, 검증 가능성을 프로파일링한 후, 검증 가능한 스킬을 위한 작업을 합성한다. 실행 전, SkillsVote는 구조화된 스킬 라이브러리 내에서 에이전트적 라이브러리 검색을 수행하여 지침적 스킬 맥락을 제공한다. 실행 후에는 궤적을 스킬 연결 하위 작업으로 분해하고, 결과를 스킬 사용, 에이전트 탐색, 환경, 결과 신호에 귀속시키며, 증거 기반 업데이트에 성공적인 재사용 가능 발견만을 허용한다. 평가에서 오프라인 진화는 Terminal-Bench 2.0에서 GPT-5.2의 성능을 최대 7.9%p 향상시키고, 온라인 진화는 SWE-Bench Pro에서 최대 2.6%p 향상시킨다. 전반적으로, 통제된 외부 스킬 라이브러리는 시스템이 노출, 귀속, 보존을 통제할 때 모델 업데이트 없이 고정된 에이전트를 개선할 수 있다.

English

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneven, environment-sensitive artifacts, and indiscriminate updates can pollute future context. We present SkillsVote, a lifecycle-governance framework for Agent Skills from collection and recommendation to evolution. SkillsVote profiles a million-scale open-source corpus for environment requirements, quality, and verifiability, then synthesizes tasks for verifiable skills. Before execution, SkillsVote performs agentic library search over structured skill library to expose instructional skill context. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill use, agent exploration, environment, and result signals, and admits only successful reusable discoveries to evidence-gated updates. In our evaluation, offline evolution improves GPT-5.2 on Terminal-Bench 2.0 by up to 7.9 pp, while online evolution improves SWE-Bench Pro by up to 2.6 pp. Overall, governed external skill libraries can improve frozen agents without model updates when systems control exposure, credit, and preservation.