강화 학습을 통한 대형 언어 모델의 행위적 추론 및 도구 통합

초록

대규모 언어 모델(LLMs)은 복잡한 추론 작업에서 놀라운 진전을 이루었지만, 정적인 내부 지식과 텍스트 기반 추론에 의존한다는 근본적인 한계를 여전히 가지고 있다. 실제 문제 해결은 종종 동적이고 다단계적인 추론, 적응형 의사결정, 그리고 외부 도구 및 환경과의 상호작용 능력을 요구한다. 본 연구에서는 에이전트 기반 추론, 강화 학습, 도구 통합을 긴밀하게 결합한 통합 프레임워크인 ARTIST(Agentic Reasoning and Tool Integration in Self-improving Transformers)를 소개한다. ARTIST는 다중 턴 추론 체인 내에서 언제, 어떻게, 어떤 도구를 호출할지 모델이 자율적으로 결정할 수 있도록 하며, 결과 기반 강화 학습을 통해 단계별 감독 없이도 도구 사용 및 환경 상호작용을 위한 강력한 전략을 학습한다. 수학적 추론 및 다중 턴 함수 호출 벤치마크에서의 광범위한 실험을 통해 ARTIST가 최신 베이스라인을 꾸준히 능가하며, 기본 모델 대비 최대 22%의 절대적 성능 향상과 가장 어려운 작업에서의 강력한 성과를 보임을 확인했다. 상세한 연구 및 지표 분석은 에이전트 기반 강화 학습 훈련이 더 깊은 추론, 더 효과적인 도구 사용, 그리고 더 높은 품질의 해결책으로 이어짐을 보여준다. 본 연구 결과는 도구 통합을 통한 에이전트 기반 강화 학습이 LLMs에서 강력하고 해석 가능하며 일반화 가능한 문제 해결을 위한 새로운 전선으로 자리 잡았음을 입증한다.

English

Large language models (LLMs) have achieved remarkable progress in complex reasoning tasks, yet they remain fundamentally limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this work, we introduce ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for LLMs. ARTIST enables models to autonomously decide when, how, and which tools to invoke within multi-turn reasoning chains, leveraging outcome-based RL to learn robust strategies for tool use and environment interaction without requiring step-level supervision. Extensive experiments on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST consistently outperforms state-of-the-art baselines, with up to 22% absolute improvement over base models and strong gains on the most challenging tasks. Detailed studies and metric analyses reveal that agentic RL training leads to deeper reasoning, more effective tool use, and higher-quality solutions. Our results establish agentic RL with tool integration as a powerful new frontier for robust, interpretable, and generalizable problem-solving in LLMs.

강화 학습을 통한 대형 언어 모델의 행위적 추론 및 도구 통합

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

초록

Support