환경 인식 정보 검색의 행동 이해

초록

최근 검색 증강 생성(RAG) 접근법은 복잡한 질의를 처리하는 데 강력한 성능을 입증했지만, 현재 연구는 중요한 과제를 간과하고 있다: 서로 다른 검색기는 최적의 성능을 위해 근본적으로 다른 질의 구성 전략을 필요로 한다는 점이다. 본 연구에서는 강화 학습(RL)을 통해 LLM이 다양한 검색기에 맞춰 질의 구성 전략을 학습할 수 있는 방법에 대한 최초의 체계적 분석을 제시한다. 실증 연구를 통해 RL이 LLM이 특정 검색기 특성에 맞춰 질의를 조정하는 데 효과적임을 밝혀냈다. 또한 다양한 검색기가 각기 현저히 다른 최적 질의 스타일(예: 기술적 vs. 질문형)을 보여, 한 검색기에서 학습된 전략이 다른 검색기에는 비효율적임을 시사한다. 나아가 검색기별 인간 가이드라인을 통합하고 모델 크기를 확장함으로써 성능을 향상시킬 수 있음을 보여준다. 다중 검색 단계 궤적에 대한 학습을 촉진하기 위해 훈련 안정성을 개선하는 분기 기반 롤아웃 기법을 도입한다. 본 연구는 진정한 검색기 인식 RAG 시스템 구축을 위한 최초의 실증적 증거와 실용적 통찰력을 제공한다. 코드와 자료는 https://github.com/LCO-Embedding/Envs-aware-Information-Retrieval에서 확인할 수 있다.

English

Recent retrieval-augmented generation (RAG) approaches have demonstrated strong capability in handling complex queries, yet current research overlooks a critical challenge: different retrievers require fundamentally different query formulation strategies for optimal performance. In this work, we present the first systematic analysis of how LLMs can learn to adapt their query formulation strategies for different retrievers via reinforcement learning (RL). Our empirical study reveals that RL effectively teaches an LLM to tailor its queries to specific retriever characteristics. We discover that different retrievers exhibit surprisingly distinct optimal query styles (e.g., descriptive vs. question-like), suggesting strategies learned for one retriever ineffective for another. We further show that performance can be enhanced by incorporating retriever-specific human guidance and by scaling model size. To facilitate learning over multi-retrieval-step trajectories, we introduce a branching-based rollout technique that improves training stability. Our work provides the first empirical evidence and actionable insights for building truly retriever-aware RAG systems. Code and resources are available at https://github.com/LCO-Embedding/Envs-aware-Information-Retrieval.