ReaRAG：知識引導的推理通過迭代檢索增強生成提升大型推理模型的事實性

摘要

大型推理模型（LRMs）展現了卓越的推理能力，但主要依賴於參數化知識，這限制了其事實準確性。儘管近期研究為基於強化學習（RL）的LRMs配備了檢索能力，這些模型仍存在過度思考及推理缺乏魯棒性的問題，降低了其在問答（QA）任務中的效能。為解決此問題，我們提出了ReaRAG，這是一個增強事實性的推理模型，它能在不過度迭代的情況下探索多樣化的查詢。我們的解決方案包括一個新穎的數據構建框架，該框架對推理鏈長度設定了上限。具體而言，我們首先利用LRM生成深思熟慮的思考，然後從預定義的行動空間（搜索與完成）中選擇一個行動。對於搜索行動，會對RAG引擎執行查詢，其結果作為觀察返回，以指導後續的推理步驟。此過程迭代進行，直到選擇完成行動為止。得益於ReaRAG強大的推理能力，我們的方法在多跳QA任務上超越了現有的基準。進一步的分析凸顯了其強大的反思能力，能夠識別錯誤並精煉其推理軌跡。我們的研究在增強LRMs事實性的同時，有效地整合了檢索增強生成（RAG）的魯棒推理。

English

Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy. While recent works equip reinforcement learning (RL)-based LRMs with retrieval capabilities, they suffer from overthinking and lack robustness in reasoning, reducing their effectiveness in question answering (QA) tasks. To address this, we propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations. Our solution includes a novel data construction framework with an upper bound on the reasoning chain length. Specifically, we first leverage an LRM to generate deliberate thinking, then select an action from a predefined action space (Search and Finish). For Search action, a query is executed against the RAG engine, where the result is returned as observation to guide reasoning steps later. This process iterates until a Finish action is chosen. Benefiting from ReaRAG's strong reasoning capabilities, our approach outperforms existing baselines on multi-hop QA. Further analysis highlights its strong reflective ability to recognize errors and refine its reasoning trajectory. Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG).

ReaRAG：知識引導的推理通過迭代檢索增強生成提升大型推理模型的事實性

ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation

摘要

Support