檢索鏈增強生成

摘要

本文介紹了一種訓練 o1-like RAG 模型的方法，該模型在生成最終答案之前逐步檢索並推理相關信息。傳統的 RAG 方法通常在生成過程之前執行單一的檢索步驟，這限制了它們在處理複雜查詢時的有效性，因為檢索結果不完美。相比之下，我們提出的方法，CoRAG（Chain-of-Retrieval Augmented Generation），允許模型根據不斷演變的狀態動態重新構造查詢。為了有效訓練 CoRAG，我們利用拒絕採樣來自動生成中間檢索鏈，從而擴充現有的 RAG 數據集，這些數據集僅提供正確的最終答案。在測試時，我們提出了各種解碼策略，通過控制檢索鏈的長度和數量來擴展模型的測試時計算。跨多個基準測試的實驗結果驗證了 CoRAG 的有效性，特別是在多跳問答任務中，我們觀察到 EM 分數比強基線提高了超過 10 分。在 KILT 基準測試中，CoRAG 在各種知識密集型任務中建立了新的最先進性能。此外，我們提供了全面的分析來了解 CoRAG 的擴展行為，為未來旨在開發基於事實且扎實的基礎模型的研究奠定基礎。

English

This paper introduces an approach for training o1-like RAG models that retrieve and reason over relevant information step by step before generating the final answer. Conventional RAG methods usually perform a single retrieval step before the generation process, which limits their effectiveness in addressing complex queries due to imperfect retrieval results. In contrast, our proposed method, CoRAG (Chain-of-Retrieval Augmented Generation), allows the model to dynamically reformulate the query based on the evolving state. To train CoRAG effectively, we utilize rejection sampling to automatically generate intermediate retrieval chains, thereby augmenting existing RAG datasets that only provide the correct final answer. At test time, we propose various decoding strategies to scale the model's test-time compute by controlling the length and number of sampled retrieval chains. Experimental results across multiple benchmarks validate the efficacy of CoRAG, particularly in multi-hop question answering tasks, where we observe more than 10 points improvement in EM score compared to strong baselines. On the KILT benchmark, CoRAG establishes a new state-of-the-art performance across a diverse range of knowledge-intensive tasks. Furthermore, we offer comprehensive analyses to understand the scaling behavior of CoRAG, laying the groundwork for future research aimed at developing factual and grounded foundation models.