Q-RAG：基于价值嵌入器训练的长上下文多步检索

摘要

检索增强生成（RAG）方法通过为大型语言模型（LLM）高效筛选相关上下文，有效减少了幻觉现象并降低了推理成本。然而，现有大多数RAG方法聚焦于单步检索，这在应对需要多步搜索的复杂问题时往往力不从心。近年来，多步检索方法逐渐兴起，通常涉及微调小型LLM以执行多步检索。此类微调资源消耗极高，且无法兼容更大规模的LLM使用。为此，我们提出Q-RAG——一种利用强化学习（RL）微调嵌入模型以实现多步检索的新颖方法。在开放域问答任务中，Q-RAG相较于现有其他多步检索方法提供了兼具竞争力与资源效率的替代方案，并在流行的长上下文基准测试BabiLong和RULER上（上下文长度达1000万token）取得了最先进水平。代码已开源：https://github.com/griver/Q-RAG

English

Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular long-context benchmarks BabiLong and RULER for contexts up to 10M tokens. Code is available at https://github.com/griver/Q-RAG

Q-RAG：基于价值嵌入器训练的长上下文多步检索

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

摘要

Support