MUSE-Autoskill：透過技能創造、記憶、管理與評估實現自我進化之智能體

摘要

大型語言模型（LLM）代理依賴可重複使用的技能來解決複雜任務。然而，現有的技能創建方法將技能視為孤立且靜態的產物，限制了其可重複使用性、可靠性及長期改進。我們提出 MUSE-Autoskill Agent（記憶利用技能進化代理），這是一個以技能為核心的代理框架，允許代理透過統一的技能生命週期（創建、記憶、管理、評估與完善）持續提升任務解決能力。我們的框架使代理能夠按需創建技能，跨任務儲存與重複使用，有效組織與選取技能，並透過單元測試及執行時回饋進行評估以持續完善。我們進一步引入技能層級記憶，為每個技能積累跨任務的經驗，隨著時間實現更有效的重複使用與調適。在 SkillsBench 上的實驗初步顯示，經過生命週期管理的技能可提升任務成功率、效率、重複使用性及跨代理轉移，突顯將技能視為長效、具經驗意識且可測試資產的重要性。

English

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.