Q-RAG：基於價值嵌入器訓練的長上下文多步檢索

摘要

检索增强生成（RAG）方法通过高效筛选与大语言模型（LLM）相关的上下文信息，能有效减少模型幻觉并降低推理成本，从而提升LLM的性能。然而，现有RAG方法大多聚焦于单步检索，这在应对需要多步搜索的复杂问题时往往力不从心。近年来，多步检索方法逐渐兴起，通常涉及微调小型LLM以执行多步检索。这类微调方法资源消耗极大，且无法支持更大型LLM的使用。针对此问题，我们提出Q-RAG这一创新方法，通过强化学习（RL）对嵌入模型进行微调，实现多步检索。Q-RAG为开放域问答中的多步检索任务提供了一种兼具竞争力和资源高效性的替代方案，并在面向高达1000万token长上下文场景的流行基准测试BabiLong和RULER上取得了最先进的结果。代码已开源：https://github.com/griver/Q-RAG

English

Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular long-context benchmarks BabiLong and RULER for contexts up to 10M tokens. Code is available at https://github.com/griver/Q-RAG

Q-RAG：基於價值嵌入器訓練的長上下文多步檢索

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

摘要

Support