**HeavySkill：深度思考作為智能體驅動中的內在技能**

摘要

近期，具備協調多智能體之記憶、技能與工具使用能力的編排框架在智能體驅動領域取得顯著進展，於複雜推理任務中表現卓越。然而，真正驅動性能的內在機制仍隱匿於繁複的系統設計之後。本文提出HeavySkill視角，將深度思考不僅視為編排框架中的最小執行單元，更視為內化於模型參數中的內在技能，驅動協調器解決複雜任務。我們將此技能定義為兩階段流程——即並行推理後摘要歸納——該流程可運行於任何智能體驅動框架底層。我們透過跨領域的系統性實證研究驗證HeavySkill，結果表明此內在技能始終優於傳統的N選最優策略；值得注意的是，更強大的大型語言模型甚至能逼近Pass@N性能。關鍵在於，我們證實作為可學習技能的深度思考其廣度與深度，可透過強化學習進一步擴展，為實現無需依賴脆弱編排層、內化複雜推理能力的自我演化大型語言模型開闢了新途徑。

English

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.

HeavySkill：深度思考作為智能體驅動中的內在技能

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

摘要

Support