ChatPaper.aiChatPaper

Q-RAG: 긴 문맥 다단계 검색을 위한 가치 기반 임베더 훈련

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

May 4, 2026
저자: Artyom Sorokin, Nazar Buzun, Alexander Anokhin, Oleg Inozemcev, Egor Vedernikov, Petr Anokhin, Mikhail Burtsev, Trushkov Alexey, Yin Wenshuai, Evgeny Burnaev
cs.AI

초록

检索增强生成(RAG)方法通过高效筛选与大语言模型(LLM)相关的上下文来提升其性能,从而减少幻觉并降低推理成本。然而,现有大多数RAG方法仅聚焦于单步检索,这在回答需要多步搜索的复杂问题时往往力不从心。近期,多步检索方法逐渐兴起,通常涉及对小型LLM进行微调以实现多步检索。此类微调过程高度消耗资源,且无法支持更大规模LLM的使用。在本工作中,我们提出Q-RAG——一种新颖的方法,该方法利用强化学习(RL)对嵌入模型(Embedder)进行多步检索微调。对于开放域问答任务,Q-RAG为现有方法提供了一种具有竞争力且资源高效的替代方案,并在处理长达1000万token上下文的流行长文本基准测试BabiLong和RULER上取得了最先进的结果。代码已在 https://github.com/griver/Q-RAG 开源。
English
Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular long-context benchmarks BabiLong and RULER for contexts up to 10M tokens. Code is available at https://github.com/griver/Q-RAG
PDF81May 12, 2026