ReasonRank:以强大推理能力賦能段落排序
ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability
August 9, 2025
作者: Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, Zhicheng Dou
cs.AI
摘要
基於大型語言模型(LLM)的列表排序在許多段落排序任務中展現了卓越的性能。隨著大型推理模型的發展,許多研究已證明在測試時進行逐步推理有助於提升列表排序的效果。然而,由於缺乏推理密集型的訓練數據,現有的重排序器在許多複雜排序場景中表現不佳,且推理密集型重排序器的排序能力仍顯著不足。本文首先提出了一種自動化的推理密集型訓練數據合成框架,該框架從多個領域獲取訓練查詢和段落,並應用DeepSeek-R1生成高質量的訓練標籤。我們設計了一種自洽數據過濾機制,以確保數據質量。為了賦予列表重排序器強大的推理能力,我們進一步提出了一種兩階段後訓練方法,包括一個用於推理模式學習的冷啟動監督微調(SFT)階段,以及一個用於進一步提升排序能力的強化學習(RL)階段。在RL階段,基於列表排序的特性,我們設計了一種多視角排序獎勵,其效果優於基於排序指標的獎勵。大量實驗表明,我們訓練的推理密集型重排序器ReasonRank顯著超越了現有基線,並且比點排序器Rank1具有更低的延遲。通過進一步實驗,我們的ReasonRank在BRIGHT排行榜上達到了40.6的頂尖性能(SOTA)。我們的代碼已開源於https://github.com/8421BCD/ReasonRank。
English
Large Language Model (LLM) based listwise ranking has shown superior
performance in many passage ranking tasks. With the development of Large
Reasoning Models, many studies have demonstrated that step-by-step reasoning
during test-time helps improve listwise ranking performance. However, due to
the scarcity of reasoning-intensive training data, existing rerankers perform
poorly in many complex ranking scenarios and the ranking ability of
reasoning-intensive rerankers remains largely underdeveloped. In this paper, we
first propose an automated reasoning-intensive training data synthesis
framework, which sources training queries and passages from diverse domains and
applies DeepSeek-R1 to generate high-quality training labels. A
self-consistency data filtering mechanism is designed to ensure the data
quality. To empower the listwise reranker with strong reasoning ability, we
further propose a two-stage post-training approach, which includes a cold-start
supervised fine-tuning (SFT) stage for reasoning pattern learning and a
reinforcement learning (RL) stage for further ranking ability enhancement.
During the RL stage, based on the nature of listwise ranking, we design a
multi-view ranking reward, which is more effective than a ranking metric-based
reward. Extensive experiments demonstrate that our trained reasoning-intensive
reranker ReasonRank outperforms existing baselines significantly and
also achieves much lower latency than pointwise reranker Rank1. Through
further experiments, our ReasonRank has achieved state-of-the-art (SOTA)
performance 40.6 on the BRIGHT
leaderboard\footnote{https://brightbenchmark.github.io/.} Our codes are
available at https://github.com/8421BCD/ReasonRank.