MUSE-Autoskill: 스킬 생성, 기억, 관리 및 평가를 통한 자기 진화 에이전트

초록

대규모 언어 모델(LLM) 에이전트는 재사용 가능한 스킬에 의존하여 복잡한 작업을 해결한다. 그러나 기존의 스킬 생성 접근법은 스킬을 고립되고 정적인 인공물로 취급하여 재사용성, 신뢰성 및 장기적인 개선을 제한한다. 본 논문에서는 MUSE-Autoskill Agent(Memory-Utilizing Skill Evolution)를 제안한다. 이는 에이전트가 통합된 수명 주기(생성, 기억, 관리, 평가, 개선) 하에서 스킬을 생성, 재사용 및 개선함으로써 작업 해결 능력을 지속적으로 향상시킬 수 있도록 하는 스킬 중심 에이전트 프레임워크이다. 제안된 프레임워크는 에이전트가 필요에 따라 스킬을 생성하고, 작업 간에 저장 및 재사용하며, 효율적으로 구성 및 선택하고, 단위 테스트와 런타임 피드백을 통해 평가하여 지속적으로 개선할 수 있게 한다. 또한, 각 스킬에 대해 작업 간 경험을 축적하는 스킬 수준의 기억을 도입하여 시간이 지남에 따라 더 효과적인 재사용과 적응을 가능하게 한다. SkillsBench에서의 실험은 수명 주기 관리된 스킬이 작업 성공률, 효율성, 재사용성 및 에이전트 간 전이를 향상시킬 수 있다는 초기 증거를 제공하며, 스킬을 장기적이고 경험 인식 가능하며 테스트 가능한 자산으로 취급하는 것의 중요성을 강조한다.

English

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.