通过强化学习实现大语言模型的自主推理与工具集成

摘要

大型语言模型（LLMs）在复杂推理任务中取得了显著进展，但其本质上仍受限于对静态内部知识和纯文本推理的依赖。现实世界的问题解决往往需要动态、多步骤的推理、适应性决策以及与外置工具和环境交互的能力。在本研究中，我们提出了ARTIST（自改进Transformer中的代理推理与工具集成），这是一个将代理推理、强化学习及工具集成紧密耦合的统一框架。ARTIST使模型能够在多轮推理链中自主决定何时、如何以及调用哪些工具，利用基于结果的强化学习来学习工具使用和环境交互的稳健策略，而无需步骤级监督。在数学推理和多轮函数调用基准测试上的广泛实验表明，ARTIST持续超越最先进的基线模型，相较于基础模型实现了高达22%的绝对提升，并在最具挑战性的任务上展现出强劲优势。详细研究和指标分析揭示，代理强化学习训练促进了更深层次的推理、更有效的工具使用以及更高质量的解决方案。我们的研究成果确立了结合工具集成的代理强化学习作为LLMs中实现稳健、可解释且可泛化问题解决的一个强大新前沿。

English

Large language models (LLMs) have achieved remarkable progress in complex reasoning tasks, yet they remain fundamentally limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this work, we introduce ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for LLMs. ARTIST enables models to autonomously decide when, how, and which tools to invoke within multi-turn reasoning chains, leveraging outcome-based RL to learn robust strategies for tool use and environment interaction without requiring step-level supervision. Extensive experiments on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST consistently outperforms state-of-the-art baselines, with up to 22% absolute improvement over base models and strong gains on the most challenging tasks. Detailed studies and metric analyses reveal that agentic RL training leads to deeper reasoning, more effective tool use, and higher-quality solutions. Our results establish agentic RL with tool integration as a powerful new frontier for robust, interpretable, and generalizable problem-solving in LLMs.

通过强化学习实现大语言模型的自主推理与工具集成

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

摘要

Support