ChatPaper.aiChatPaper

ReasonRank:以强大推理能力赋能段落排序

ReasonRank: Empowering Passage Ranking with Strong Reasoning Ability

August 9, 2025
作者: Wenhan Liu, Xinyu Ma, Weiwei Sun, Yutao Zhu, Yuchen Li, Dawei Yin, Zhicheng Dou
cs.AI

摘要

基于大型语言模型(LLM)的列表排序在众多段落排序任务中展现了卓越性能。随着大型推理模型的发展,多项研究证实,测试时逐步推理有助于提升列表排序效果。然而,由于缺乏推理密集型的训练数据,现有重排序器在许多复杂排序场景中表现欠佳,推理密集型重排序器的排序能力仍待大幅提升。本文首先提出了一种自动化的推理密集型训练数据合成框架,该框架从多领域采集训练查询与段落,并运用DeepSeek-R1生成高质量的训练标签。为确保数据质量,设计了自一致性数据过滤机制。为赋予列表重排序器强大的推理能力,我们进一步提出了一种两阶段的后训练方法,包括用于推理模式学习的冷启动监督微调(SFT)阶段,以及用于进一步强化排序能力的强化学习(RL)阶段。在RL阶段,基于列表排序的本质,我们设计了一种多视角排序奖励,相比基于排序指标的奖励更为有效。大量实验表明,我们训练的推理密集型重排序器ReasonRank显著超越了现有基线,且相比点排序器Rank1实现了更低的延迟。通过进一步实验,ReasonRank在BRIGHT排行榜上取得了40.6的当前最优(SOTA)成绩\footnote{https://brightbenchmark.github.io/.}。我们的代码已发布于https://github.com/8421BCD/ReasonRank。
English
Large Language Model (LLM) based listwise ranking has shown superior performance in many passage ranking tasks. With the development of Large Reasoning Models, many studies have demonstrated that step-by-step reasoning during test-time helps improve listwise ranking performance. However, due to the scarcity of reasoning-intensive training data, existing rerankers perform poorly in many complex ranking scenarios and the ranking ability of reasoning-intensive rerankers remains largely underdeveloped. In this paper, we first propose an automated reasoning-intensive training data synthesis framework, which sources training queries and passages from diverse domains and applies DeepSeek-R1 to generate high-quality training labels. A self-consistency data filtering mechanism is designed to ensure the data quality. To empower the listwise reranker with strong reasoning ability, we further propose a two-stage post-training approach, which includes a cold-start supervised fine-tuning (SFT) stage for reasoning pattern learning and a reinforcement learning (RL) stage for further ranking ability enhancement. During the RL stage, based on the nature of listwise ranking, we design a multi-view ranking reward, which is more effective than a ranking metric-based reward. Extensive experiments demonstrate that our trained reasoning-intensive reranker ReasonRank outperforms existing baselines significantly and also achieves much lower latency than pointwise reranker Rank1. Through further experiments, our ReasonRank has achieved state-of-the-art (SOTA) performance 40.6 on the BRIGHT leaderboard\footnote{https://brightbenchmark.github.io/.} Our codes are available at https://github.com/8421BCD/ReasonRank.
PDF1074August 12, 2025