ChatPaper.aiChatPaper

透過強化學習實現大型語言模型的主動推理與工具整合

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

April 28, 2025
作者: Joykirat Singh, Raghav Magazine, Yash Pandya, Akshay Nambi
cs.AI

摘要

大型語言模型(LLMs)在複雜推理任務中取得了顯著進展,但其根本上仍受限於對靜態內部知識和純文本推理的依賴。現實世界中的問題解決往往需要動態、多步驟的推理、適應性決策以及與外部工具和環境交互的能力。在本研究中,我們引入了ARTIST(自我改進變換器中的代理推理與工具集成),這是一個將代理推理、強化學習和工具集成緊密結合的統一框架。ARTIST使模型能夠在多輪推理鏈中自主決定何時、如何以及調用哪些工具,利用基於結果的強化學習來學習工具使用和環境交互的穩健策略,而無需步驟級別的監督。在數學推理和多輪函數調用基準上的廣泛實驗表明,ARTIST始終優於最先進的基線模型,相較於基礎模型實現了高達22%的絕對提升,並在最具挑戰性的任務上取得了顯著增益。詳細研究和指標分析揭示,代理強化學習訓練促進了更深層次的推理、更有效的工具使用以及更高質量的解決方案。我們的結果確立了結合工具集成的代理強化學習作為LLMs中穩健、可解釋且可泛化問題解決的一個強大新前沿。
English
Large language models (LLMs) have achieved remarkable progress in complex reasoning tasks, yet they remain fundamentally limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this work, we introduce ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for LLMs. ARTIST enables models to autonomously decide when, how, and which tools to invoke within multi-turn reasoning chains, leveraging outcome-based RL to learn robust strategies for tool use and environment interaction without requiring step-level supervision. Extensive experiments on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST consistently outperforms state-of-the-art baselines, with up to 22% absolute improvement over base models and strong gains on the most challenging tasks. Detailed studies and metric analyses reveal that agentic RL training leads to deeper reasoning, more effective tool use, and higher-quality solutions. Our results establish agentic RL with tool integration as a powerful new frontier for robust, interpretable, and generalizable problem-solving in LLMs.

Summary

AI-Generated Summary

PDF122May 6, 2025