ChatPaper.aiChatPaper

s3:通过强化学习训练搜索代理,你并不需要那么多数据

s3: You Don't Need That Much Data to Train a Search Agent via RL

May 20, 2025
作者: Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han
cs.AI

摘要

检索增强生成(RAG)系统赋予大型语言模型(LLMs)在推理过程中访问外部知识的能力。近期进展通过强化学习(RL)使LLMs能够作为搜索代理,通过与检索引擎的多轮交互提升信息获取效率。然而,现有方法要么仅使用搜索指标(如NDCG)优化检索,忽视了后续应用的价值;要么对整个LLM进行微调,使其同时进行推理与检索,这种做法将检索与生成过程紧密耦合,限制了实际搜索效用及与冻结或专有模型的兼容性。本研究提出s3,一个轻量级、模型无关的框架,它将搜索器与生成器解耦,并采用“超越RAG增益”作为奖励来训练搜索器:即相较于基础RAG在生成准确性上的提升。s3仅需2.4k训练样本即可超越基于超过70倍数据训练的基线模型,在六项通用问答和五项医疗问答基准测试中持续展现出更优的下游性能。
English
Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.

Summary

AI-Generated Summary

PDF152May 26, 2025