REARANK：基于强化学习的推理重排序智能体

摘要

我们提出了REARANK，一种基于大语言模型（LLM）的列表式推理重排序代理。REARANK在重排序前进行显式推理，显著提升了性能与可解释性。通过强化学习与数据增强技术，REARANK在多个主流信息检索基准测试中较基线模型取得了显著进步，尤其值得注意的是，仅需179个标注样本即可实现。基于Qwen2.5-7B构建的REARANK-7B，在领域内及跨领域基准测试中展现出与GPT-4相媲美的性能，并在推理密集型的BRIGHT基准测试中甚至超越了GPT-4。这些成果验证了我们方法的有效性，并凸显了强化学习在提升LLM重排序推理能力方面的潜力。

English

We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent. REARANK explicitly reasons before reranking, significantly improving both performance and interpretability. Leveraging reinforcement learning and data augmentation, REARANK achieves substantial improvements over baseline models across popular information retrieval benchmarks, notably requiring only 179 annotated samples. Built on top of Qwen2.5-7B, our REARANK-7B demonstrates performance comparable to GPT-4 on both in-domain and out-of-domain benchmarks and even surpasses GPT-4 on reasoning-intensive BRIGHT benchmarks. These results underscore the effectiveness of our approach and highlight how reinforcement learning can enhance LLM reasoning capabilities in reranking.