ARISE：分层强化学习中基于内在技能演进的智能体推理

摘要

当前提升语言模型数学推理能力的主流范式依赖于可验证奖励的强化学习。然而现有方法将每个问题实例视为独立任务，未能充分利用训练过程中涌现并积累的可复用策略。为此，我们提出ARISE（基于内在技能演化的智能体推理）——一种分层强化学习框架，其共享策略既在高层管理技能（称为技能管理器），又在低层生成应答（称为工作者）。管理器通过专设的技能生成推演模块，对成功解题轨迹进行结构化总结（执行后），同时采用策略驱动的选择机制检索相关技能以指导后续推演（执行前）。分层奖励设计引导推理能力与技能库质量的协同进化。在两种基础模型和七个基准测试（涵盖竞赛数学与Omni-MATH）上的实验表明，ARISE持续优于GRPO系列算法及记忆增强基线方法，尤其在分布外任务上提升显著。消融研究证实各组件均对性能提升有所贡献，且技能库质量与推理性能在训练过程中同步增强。代码已开源：https://github.com/Skylanding/ARISE。

English

The dominant paradigm for improving mathematical reasoning in language models relies on Reinforcement Learning with verifiable rewards. Yet existing methods treat each problem instance in isolation without leveraging the reusable strategies that emerge and accumulate during training. To this end, we introduce ARISE (Agent Reasoning via Intrinsic Skill Evolution), a hierarchical reinforcement learning framework, in which a shared policy operates both to manage skills at high-level and to generate responses at low-level (denoted as a Skills Manager and a Worker, respectively). The Manager maintains a tiered skill library through a dedicated skill generation rollout that performs structured summarization of successful solution traces (after execution), while employing a policy-driven selection mechanism to retrieve relevant skills to condition future rollouts (before execution). A hierarchical reward design guides the co-evolution of reasoning ability and library quality. Experiments on two base models and seven benchmarks spanning both competition mathematics and Omni-MATH show that ARISE consistently outperforms GRPO-family algorithms and memory-augmented baselines, with particularly notable gains on out-of-distribution tasks. Ablation studies confirm that each component contributes to the observed improvements and that library quality and reasoning performance improve in tandem throughout training. Code is available at https://github.com/Skylanding/ARISE{https://github.com/Skylanding/ARISE}.

ARISE：分层强化学习中基于内在技能演进的智能体推理

ARISE: Agent Reasoning with Intrinsic Skill Evolution in Hierarchical Reinforcement Learning

摘要

Support