Tool-Star：透過強化學習賦能具備多工具推理能力的大型語言模型

摘要

近年來，大型語言模型（LLMs）通過大規模強化學習（RL）展現了卓越的推理能力。然而，如何利用RL算法來增強LLMs在多工具協同推理中的有效性仍是一個開放性挑戰。本文介紹了Tool-Star，這是一個基於RL的框架，旨在賦能LLMs在逐步推理過程中自主調用多種外部工具。Tool-Star整合了六類工具，並在數據合成與訓練中採用了系統化的設計。針對工具使用數據稀缺的問題，我們提出了一種通用的工具集成推理數據合成管道，該管道結合了工具集成提示與基於提示的採樣，以自動化且可擴展地生成工具使用軌跡。隨後，通過質量歸一化與難度感知分類過程，過濾掉低質量樣本，並將數據集從易到難進行組織。此外，我們提出了一個兩階段訓練框架，以增強多工具協同推理能力，具體包括：（1）冷啟動微調，通過工具調用反饋引導LLMs探索推理模式；（2）帶有層次獎勵設計的多工具自我批評RL算法，強化獎勵理解並促進有效的工具協作。在超過10個具有挑戰性的推理基準上的實驗分析，凸顯了Tool-Star的有效性與效率。代碼已公開於https://github.com/dongguanting/Tool-Star。

English

Recently, large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL). However, leveraging the RL algorithm to empower effective multi-tool collaborative reasoning in LLMs remains an open challenge. In this paper, we introduce Tool-Star, an RL-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasoning. Tool-Star integrates six types of tools and incorporates systematic designs in both data synthesis and training. To address the scarcity of tool-use data, we propose a general tool-integrated reasoning data synthesis pipeline, which combines tool-integrated prompting with hint-based sampling to automatically and scalably generate tool-use trajectories. A subsequent quality normalization and difficulty-aware classification process filters out low-quality samples and organizes the dataset from easy to hard. Furthermore, we propose a two-stage training framework to enhance multi-tool collaborative reasoning by: (1) cold-start fine-tuning, which guides LLMs to explore reasoning patterns via tool-invocation feedback; and (2) a multi-tool self-critic RL algorithm with hierarchical reward design, which reinforces reward understanding and promotes effective tool collaboration. Experimental analyses on over 10 challenging reasoning benchmarks highlight the effectiveness and efficiency of Tool-Star. The code is available at https://github.com/dongguanting/Tool-Star.

Tool-Star：透過強化學習賦能具備多工具推理能力的大型語言模型

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

摘要

Support