Skill0.5：面向智能体强化学习中分布外泛化的联合技能内化与利用

摘要

为大型语言模型配备显式技能已成为一种有前景的范式，使自主代理能够解决复杂任务。代理技能本质上可分为用于广泛认知迁移的通识技能和用于动态执行的特定任务技能。然而，现有的基于技能的强化学习方法通常强制在完全外化（会导致高昂的上下文开销）与完全内化（可能导致过拟合和知识冲突）之间做出僵化选择。为解决这一困境，我们提出Skill0.5，一种新型的智能体强化学习框架，通过将通识技能内化与特定任务技能利用相结合，明确区分技能处理方式。在动态、难度感知路由器的驱动下，Skill0.5将任务分流至不同的掌握层级，以应用定制化的优化策略：它通过特权蒸馏内化通识技能，为困难任务构建认知基础；同时在简单任务上使用诊断探测来惩罚捷径行为并强制利用特定技能。在ALFWorld和WebShop上的实验表明，Skill0.5优于基于记忆和基于技能的强化学习基线，在分布内和分布外场景中均实现了性能提升。

English

Equipping large language models with explicit skills has emerged as a promising paradigm for enabling autonomous agents to solve complex tasks. Agent skills can be inherently divided into general skills for broad cognitive transfer and task-specific skills for dynamic execution. However, existing skill-based reinforcement learning (RL) methods typically force a rigid choice between full externalization, which incurs prohibitive context overhead, and full internalization, which risks overfitting and knowledge conflicts. To address this dilemma, we propose Skill0.5, a novel agentic RL framework that explicitly differentiates skill treatments by combining general skill internalization with task-specific skill utilization. Driven by a dynamic, difficulty-aware router, Skill0.5 streams tasks into distinct mastery tiers to apply tailored optimization strategies: it internalizes general skills via privileged distillation to build a cognitive foundation for hard tasks, while using diagnostic probing on easy tasks to penalize shortcuts and enforce specific skill utilization. Experiments on ALFWorld and WebShop demonstrate that Skill0.5 outperforms both memory-based and skill-based RL baselines, yielding performance improvements across both in-distribution and out-of-distribution scenarios.