s3: RLを用いた検索エージェントの訓練にはそれほど多くのデータは必要ない

要旨

検索拡張生成（RAG）システムは、大規模言語モデル（LLM）が推論中に外部知識にアクセスすることを可能にする。最近の進展により、LLMは強化学習（RL）を介して検索エージェントとして機能し、検索エンジンとの多段階インタラクションを通じて情報取得を改善することができるようになった。しかし、既存のアプローチでは、下流の有用性を無視した検索専用の指標（例：NDCG）を使用して検索を最適化するか、LLM全体を微調整して推論と検索を同時に行うことで、検索と生成を絡ませ、実際の検索の有用性や凍結またはプロプライエタリなモデルとの互換性を制限している。本研究では、検索器と生成器を分離し、検索器を「Gain Beyond RAG」報酬（素朴なRAGを超える生成精度の向上）を使用して訓練する、軽量でモデルに依存しないフレームワークであるs3を提案する。s3は、わずか2.4kの訓練サンプルで、70倍以上のデータで訓練されたベースラインを上回り、6つの一般QAおよび5つの医療QAベンチマークで一貫して優れた下流性能を提供する。

English

Retrieval-augmented generation (RAG) systems empower large language models (LLMs) to access external knowledge during inference. Recent advances have enabled LLMs to act as search agents via reinforcement learning (RL), improving information acquisition through multi-turn interactions with retrieval engines. However, existing approaches either optimize retrieval using search-only metrics (e.g., NDCG) that ignore downstream utility or fine-tune the entire LLM to jointly reason and retrieve-entangling retrieval with generation and limiting the real search utility and compatibility with frozen or proprietary models. In this work, we propose s3, a lightweight, model-agnostic framework that decouples the searcher from the generator and trains the searcher using a Gain Beyond RAG reward: the improvement in generation accuracy over naive RAG. s3 requires only 2.4k training samples to outperform baselines trained on over 70x more data, consistently delivering stronger downstream performance across six general QA and five medical QA benchmarks.

s3: RLを用いた検索エージェントの訓練にはそれほど多くのデータは必要ない

s3: You Don't Need That Much Data to Train a Search Agent via RL

要旨

Support