SkillRL:基於遞歸技能增強的強化學習演化智能體
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
February 9, 2026
作者: Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, Huaxiu Yao
cs.AI
摘要
大型語言模型(LLM)代理在複雜任務中展現出驚人成效,卻常因孤立運作而無法從過往經驗中學習。現有基於記憶的方法主要儲存原始軌跡,但這些數據往往存在冗餘與雜訊,阻礙代理提取對泛化至關重要的高階可重複行為模式。本文提出SkillRL框架,透過自動技能發現與遞迴演化機制,在原始經驗與策略提升之間建立橋樑。我們的方法包含三項創新:基於經驗的蒸餾機制構建分層技能庫SkillBank、適用通用與特定任務啟發的自適應檢索策略,以及使技能庫能在強化學習過程中與代理策略協同演化的遞迴進化機制。這些設計在提升推理效用的同時顯著降低標記使用量。在ALFWorld、WebShop及七項搜索增強任務上的實驗表明,SkillRL以超過強基線15.3%的優勢達成頂尖性能,並在任務複雜度增加時保持穩健性。程式碼已開源於:https://github.com/aiming-lab/SkillRL。
English
Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning. These innovations significantly reduce the token footprint while enhancing reasoning utility. Experimental results on ALFWorld, WebShop and seven search-augmented tasks demonstrate that SkillRL achieves state-of-the-art performance, outperforming strong baselines over 15.3% and maintaining robustness as task complexity increases. Code is available at this https://github.com/aiming-lab/SkillRL.