GroupRank:一种基于强化学习的群体重排序新范式
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning
November 10, 2025
作者: Duolin Sun, Meixiu Long, Dan Yang, Yihan Jiao, Zhehao Tan, Jie Feng, Junjie Wang, Yue Shen, Peng Wei, Jian Wang, Jinjie Gu
cs.AI
摘要
大型语言模型展现出作为重排序器的强大潜力,能够有效提升RAG系统的整体性能。然而现有重排序范式始终受困于一个核心的理论与实践困境:点式方法虽简单灵活,但独立评估文档的特性使其易陷入"排序短视陷阱",难以捕捉文档间的相对重要性;而列式方法虽能感知全局排序上下文,却存在固有的"列表刚性缺陷",在处理大规模候选集时面临严重的可扩展性与灵活性挑战。为突破这些局限,我们提出分组式重排序新范式。该方法将查询与候选文档组共同输入模型,通过组内比较为每个文档分配独立相关性分数,在保留点式方法灵活性的同时兼具列式方法的对比能力。我们进一步采用GRPO进行模型训练,配备融合排序指标与分布奖励的异构奖励函数,以实现跨组分数分布的对齐。针对高质量标注数据稀缺的瓶颈,我们创新性地提出高质量检索排序数据合成流程,生成的数据不仅可用于训练重排序器,还能用于训练检索器。大量实验验证了方法的有效性:在BRIGHT和R2MED两个推理密集型检索基准测试中均取得显著提升。
English
Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical and practical dilemma: Pointwise methods, while simple and highly flexible, evaluate documents independently, making them prone to the Ranking Myopia Trap, overlooking the relative importance between documents. In contrast, Listwise methods can perceive the global ranking context, but suffer from inherent List Rigidity, leading to severe scalability and flexibility issues when handling large candidate sets. To address these challenges, we propose Groupwise, a novel reranking paradigm. In this approach, the query and a group of candidate documents are jointly fed into the model, which performs within-group comparisons to assign individual relevance scores to each document. This design retains the flexibility of Pointwise methods while enabling the comparative capability of Listwise methods. We further adopt GRPO for model training, equipped with a heterogeneous reward function that integrates ranking metrics with a distributional reward aimed at aligning score distributions across groups. To overcome the bottleneck caused by the scarcity of high quality labeled data, we further propose an innovative pipeline for synthesizing high quality retrieval and ranking data. The resulting data can be leveraged not only for training the reranker but also for training the retriever. Extensive experiments validate the effectiveness of our approach. On two reasoning intensive retrieval benchmarks, BRIGHT and R2MED.