R1-Searcher:透過強化學習激勵大型語言模型的搜尋能力
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
March 7, 2025
作者: Huatong Song, Jinhao Jiang, Yingqian Min, Jie Chen, Zhipeng Chen, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen
cs.AI
摘要
現有的大型推理模型(LRMs)已展現出強化學習(RL)在提升大型語言模型(LLMs)複雜推理能力方面的潛力。儘管這些模型在數學和編碼等挑戰性任務上表現出色,但它們通常依賴內部知識來解決問題,這對於時間敏感或知識密集型的問題可能不足,導致不準確和幻覺現象。為解決這一問題,我們提出了R1-Searcher,這是一種新穎的基於結果的兩階段RL方法,旨在增強LLMs的搜索能力。該方法允許LLMs在推理過程中自主調用外部搜索系統以獲取額外知識。我們的框架完全依賴於RL,無需過程獎勵或蒸餾來進行冷啟動。實驗結果表明,我們的方法顯著優於以往強大的RAG方法,甚至與閉源的GPT-4o-mini相比也表現出色。
English
Existing Large Reasoning Models (LRMs) have shown the potential of
reinforcement learning (RL) to enhance the complex reasoning capabilities of
Large Language Models~(LLMs). While they achieve remarkable performance on
challenging tasks such as mathematics and coding, they often rely on their
internal knowledge to solve problems, which can be inadequate for
time-sensitive or knowledge-intensive questions, leading to inaccuracies and
hallucinations. To address this, we propose R1-Searcher, a novel
two-stage outcome-based RL approach designed to enhance the search capabilities
of LLMs. This method allows LLMs to autonomously invoke external search systems
to access additional knowledge during the reasoning process. Our framework
relies exclusively on RL, without requiring process rewards or distillation for
a cold start. % effectively generalizing to out-of-domain datasets and
supporting both Base and Instruct models. Our experiments demonstrate that our
method significantly outperforms previous strong RAG methods, even when
compared to the closed-source GPT-4o-mini.Summary
AI-Generated Summary