REARANK:基於強化學習的推理重排序代理
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
May 26, 2025
作者: Le Zhang, Bo Wang, Xipeng Qiu, Siva Reddy, Aishwarya Agrawal
cs.AI
摘要
我們提出了REARANK,這是一個基於大型語言模型(LLM)的列表式推理重排序代理。REARANK在重排序前進行顯式推理,顯著提升了性能和可解釋性。通過強化學習和數據增強,REARANK在流行的信息檢索基準上相較於基礎模型取得了顯著改進,尤其值得注意的是僅需179個註釋樣本。基於Qwen2.5-7B構建的REARANK-7B,在域內和域外基準測試中展現了與GPT-4相當的性能,並在推理密集型的BRIGHT基準上甚至超越了GPT-4。這些結果證明了我們方法的有效性,並凸顯了強化學習在提升LLM重排序推理能力方面的潛力。
English
We present REARANK, a large language model (LLM)-based listwise reasoning
reranking agent. REARANK explicitly reasons before reranking, significantly
improving both performance and interpretability. Leveraging reinforcement
learning and data augmentation, REARANK achieves substantial improvements over
baseline models across popular information retrieval benchmarks, notably
requiring only 179 annotated samples. Built on top of Qwen2.5-7B, our
REARANK-7B demonstrates performance comparable to GPT-4 on both in-domain and
out-of-domain benchmarks and even surpasses GPT-4 on reasoning-intensive BRIGHT
benchmarks. These results underscore the effectiveness of our approach and
highlight how reinforcement learning can enhance LLM reasoning capabilities in
reranking.Summary
AI-Generated Summary