環境認識型情報検索の振る舞いの理解

要旨

近年の検索拡張生成（RAG）手法は複雑なクエリ処理において高い能力を示しているが、現在の研究では決定的な課題が見落とされている。すなわち、異なる検索器（レトリーバー）は最適な性能を発揮するために、根本的に異なるクエリ作成戦略を必要とするという点である。本研究では、強化学習（RL）を通じて大規模言語モデル（LLM）が異なる検索器に応じてクエリ作成戦略を適応させる方法を学習できるかについて、初の体系的な分析を提示する。我々の実証研究により、RLがLLMに特定の検索器特性に合わせたクエリを調整することを効果的に教えることが明らかになった。驚くべきことに、異なる検索器は記述的スタイルと質問的スタイルなど、それぞれ最適なクエリスタイルが著しく異なり、ある検索器で学習した戦略は別の検索器では効果が低いことが示唆される。さらに、検索器固有の人間のガイダンスを組み込むことやモデルサイズを拡大することで、性能が向上することを示す。複数検索ステップからなる軌跡にわたる学習を容易にするため、訓練の安定性を高める分岐ベースのロールアウト手法を導入する。本研究は、真に検索器を認識するRAGシステムを構築するための初の実証的証拠と実践可能な知見を提供する。コードとリソースは https://github.com/LCO-Embedding/Envs-aware-Information-Retrieval で入手可能である。

English

Recent retrieval-augmented generation (RAG) approaches have demonstrated strong capability in handling complex queries, yet current research overlooks a critical challenge: different retrievers require fundamentally different query formulation strategies for optimal performance. In this work, we present the first systematic analysis of how LLMs can learn to adapt their query formulation strategies for different retrievers via reinforcement learning (RL). Our empirical study reveals that RL effectively teaches an LLM to tailor its queries to specific retriever characteristics. We discover that different retrievers exhibit surprisingly distinct optimal query styles (e.g., descriptive vs. question-like), suggesting strategies learned for one retriever ineffective for another. We further show that performance can be enhanced by incorporating retriever-specific human guidance and by scaling model size. To facilitate learning over multi-retrieval-step trajectories, we introduce a branching-based rollout technique that improves training stability. Our work provides the first empirical evidence and actionable insights for building truly retriever-aware RAG systems. Code and resources are available at https://github.com/LCO-Embedding/Envs-aware-Information-Retrieval.