REARANK: 강화 학습 기반 추론 재순위화 에이전트

초록

본 논문에서는 대규모 언어 모델(LLM) 기반의 리스트와이즈 추론 재랭킹 에이전트인 REARANK을 소개한다. REARANK은 재랭킹 전에 명시적으로 추론을 수행함으로써 성능과 해석 가능성을 크게 향상시킨다. 강화 학습과 데이터 증강을 활용한 REARANK은 주요 정보 검색 벤치마크에서 기준 모델 대비 상당한 개선을 달성하며, 특히 단 179개의 주석 처리된 샘플만을 필요로 한다. Qwen2.5-7B를 기반으로 구축된 REARANK-7B는 인-도메인 및 아웃-오브-도메인 벤치마크에서 GPT-4에 필적하는 성능을 보여주며, 추론 집약적인 BRIGHT 벤치마크에서는 GPT-4를 능가하기도 한다. 이러한 결과는 본 접근법의 효과를 입증하며, 강화 학습이 재랭킹에서 LLM의 추론 능력을 강화할 수 있는 방법을 보여준다.

English

We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent. REARANK explicitly reasons before reranking, significantly improving both performance and interpretability. Leveraging reinforcement learning and data augmentation, REARANK achieves substantial improvements over baseline models across popular information retrieval benchmarks, notably requiring only 179 annotated samples. Built on top of Qwen2.5-7B, our REARANK-7B demonstrates performance comparable to GPT-4 on both in-domain and out-of-domain benchmarks and even surpasses GPT-4 on reasoning-intensive BRIGHT benchmarks. These results underscore the effectiveness of our approach and highlight how reinforcement learning can enhance LLM reasoning capabilities in reranking.

REARANK: 강화 학습 기반 추론 재순위화 에이전트

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning

초록

Support