REARANK：基於強化學習的推理重排序代理

摘要

我們提出了REARANK，這是一個基於大型語言模型（LLM）的列表式推理重排序代理。REARANK在重排序前進行顯式推理，顯著提升了性能和可解釋性。通過強化學習和數據增強，REARANK在流行的信息檢索基準上相較於基礎模型取得了顯著改進，尤其值得注意的是僅需179個註釋樣本。基於Qwen2.5-7B構建的REARANK-7B，在域內和域外基準測試中展現了與GPT-4相當的性能，並在推理密集型的BRIGHT基準上甚至超越了GPT-4。這些結果證明了我們方法的有效性，並凸顯了強化學習在提升LLM重排序推理能力方面的潛力。

English

We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent. REARANK explicitly reasons before reranking, significantly improving both performance and interpretability. Leveraging reinforcement learning and data augmentation, REARANK achieves substantial improvements over baseline models across popular information retrieval benchmarks, notably requiring only 179 annotated samples. Built on top of Qwen2.5-7B, our REARANK-7B demonstrates performance comparable to GPT-4 on both in-domain and out-of-domain benchmarks and even surpasses GPT-4 on reasoning-intensive BRIGHT benchmarks. These results underscore the effectiveness of our approach and highlight how reinforcement learning can enhance LLM reasoning capabilities in reranking.

REARANK：基於強化學習的推理重排序代理

REARANK: Reasoning Re-ranking Agent via Reinforcement Learning

摘要

Support