HeavySkill: 에이전트 활용에서 내재적 기술로서의 심층 사고

초록

최근 메모리, 기술, 도구 활용을 통해 다중 에이전트를 조율하는 오케스트레이션 프레임워크를 갖춘 에이전트 하네스의 발전으로 복잡한 추론 과제에서 놀라운 성과를 이루었습니다. 그러나 성능 향상의 근본적 메커니즘은 복잡한 시스템 설계 뒤에 가려진 채로 남아있습니다. 본 논문에서는 무거운 사고(heavy thinking)를 단순한 오케스트레이션 하네스의 최소 실행 단위가 아닌, 오케스트레이터가 복잡한 과제를 해결하도록 이끄는 모델 파라미터 내부에 내재화된 내적 기술(inner skill)로 보는 HeavySkill 관점을 제안합니다. 우리는 이 기술을 병렬 추론 후 요약의 두 단계 파이프라인으로 규정하며, 이는 어떤 에이전트 하네스 아래에서도 작동할 수 있습니다. 다양한 도메인에 걸친 HeavySkill의 체계적 실험 연구를 제시합니다. 결과에 따르면 이 내적 기술은 기존의 Best-of-N(BoN) 전략을 꾸준히 능가하며, 특히 강력한 LLM은 Pass@N 성능에 근접할 수 있음을 보여줍니다. 중요한 것은, 무거운 사고의 깊이와 폭이 학습 가능한 기술로서 강화 학습을 통해 추가로 확장될 수 있음을 입증하여, 취약한 오케스트레이션 계층에 의존하지 않고 복잡한 추론을 내재화하는 자기 진화형 LLM으로 나아갈 가능성을 제시합니다.

English

Recent advances in agentic harness with orchestration frameworks that coordinate multiple agents with memory, skills, and tool use have achieved remarkable success in complex reasoning tasks. However, the underlying mechanism that truly drives performance remains obscured behind intricate system designs. In this paper, we propose HeavySkill, a perspective that views heavy thinking not only as a minimal execution unit in orchestration harness but also as an inner skill internalized within the model's parameters that drives the orchestrator to solve complex tasks. We identify this skill as a two-stage pipeline, i.e., parallel reasoning then summarization, which can operate beneath any agentic harness. We present a systematic empirical study of HeavySkill across diverse domains. Our results show that this inner skill consistently outperforms traditional Best-of-N (BoN) strategies; notably, stronger LLMs can even approach Pass@N performance. Crucially, we demonstrate that the depth and width of heavy thinking, as a learnable skill, can be further scaled via reinforcement learning, offering a promising path toward self-evolving LLMs that internalize complex reasoning without relying on brittle orchestration layers.

HeavySkill: 에이전트 활용에서 내재적 기술로서의 심층 사고

HeavySkill: Heavy Thinking as the Inner Skill in Agentic Harness

초록

Support