ヘビースキル：エージェンシック・ハーネスにおける内面的スキルとしての深遠な思考

要旨

近年、記憶・スキル・ツール利用を統合するオーケストレーション基盤によるエージェント制御技術の進展により、複雑な推論タスクにおいて顕著な成功が収められている。しかし、その性能を真に駆動する根本的なメカニズムは、複雑なシステム設計の背後に隠されたままである。本論文では、HeavySkillという新たな視点を提案する。これは「重い思考」を、単なるオーケストレーション制御における最小実行単位としてではなく、モデルパラメータ内に内在化され、オーケストレータに複雑課題解決を駆動する内的スキルとして捉える。我々はこのスキルを「並列推論→要約」の2段階パイプラインと定義し、あらゆるエージェント制御基盤の下層で動作可能であることを示す。多様な領域におけるHeavySkillの体系的な実証研究を通じて、この内的スキルが従来のBest-of-N（BoN）戦略を一貫して上回ること、特に強力な大規模言語モデルではPass@N性能に迫り得ることを明らかにする。決定的に、強化学習を通じて「重い思考」の深さと幅が学習可能なスキルとして拡張可能であることを実証し、脆弱なオーケストレーション層に依存せず複雑推論を内在化する自己進化型大規模言語モデルへの道筋を示す。

English

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.

ヘビースキル：エージェンシック・ハーネスにおける内面的スキルとしての深遠な思考

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

要旨

Support