重型技能：智能代理核心中的深度思考内功

摘要

近期，具备协调多智能体记忆、技能与工具使用的编排框架在智能体化系统方面取得显著进展，已在复杂推理任务中实现卓越成就。然而，真正驱动性能的内在机制仍被复杂的系统设计所掩盖。本文提出HeavySkill（深度思考技能）视角，将深度思考不仅视为编排框架中的最小执行单元，更视作模型参数内化的内在技能——这种技能驱动编排器解决复杂任务。我们将其识别为两阶段流程（并行推理后接总结归纳），可在任意智能体化系统底层运行。我们通过跨领域系统性实验验证HeavySkill的效果，结果表明该内在技能持续优于传统BoN策略；值得注意的是，更强的大语言模型甚至能逼近Pass@N性能。关键的是，我们证明作为可学习技能的深度思考的深度与广度，可通过强化学习进一步扩展，这为构建不依赖脆弱编排层、内化复杂推理能力的自我进化大语言模型开辟了新路径。

English

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.