GroupRank:一種基於強化學習驅動的群組重排序範式
GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning
November 10, 2025
作者: Duolin Sun, Meixiu Long, Dan Yang, Yihan Jiao, Zhehao Tan, Jie Feng, Junjie Wang, Yue Shen, Peng Wei, Jian Wang, Jinjie Gu
cs.AI
摘要
大型語言模型展現出作為重排序器的強大潛力,能夠有效提升檢索增強生成系統的整體效能。然而,現有重排序方法面臨核心的理論與實踐困境:逐點法雖具備簡潔性與高靈活性,但因獨立評估文檔而容易陷入「排序短視陷阱」,忽略文檔間的相對重要性;相比之下,列表法雖能感知全局排序上下文,卻存在固有的「列表剛性缺陷」,在處理大規模候選集時會產生嚴重的可擴展性與靈活性問題。為解決這些難題,我們提出分組式重排序新範式。該方法將查詢與候選文檔組共同輸入模型,通過組內比較為每個文檔賦予獨立相關性分數。此設計既保留逐點法的靈活性,又實現列表法的對比能力。我們進一步採用GRPO進行模型訓練,配備融合排序指標與分佈式獎勵的異構獎勵函數,旨在實現跨組別分數分佈的對齊。為克服高質量標註數據稀缺的瓶頸,我們創新性地提出高質量檢索與排序數據的合成流程,所生成數據不僅可用於訓練重排序器,也能用於訓練檢索器。大量實驗驗證了本方法的有效性:在BRIGHT和R2MED兩個推理密集型檢索基準測試中均表現卓越。
English
Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical and practical dilemma: Pointwise methods, while simple and highly flexible, evaluate documents independently, making them prone to the Ranking Myopia Trap, overlooking the relative importance between documents. In contrast, Listwise methods can perceive the global ranking context, but suffer from inherent List Rigidity, leading to severe scalability and flexibility issues when handling large candidate sets. To address these challenges, we propose Groupwise, a novel reranking paradigm. In this approach, the query and a group of candidate documents are jointly fed into the model, which performs within-group comparisons to assign individual relevance scores to each document. This design retains the flexibility of Pointwise methods while enabling the comparative capability of Listwise methods. We further adopt GRPO for model training, equipped with a heterogeneous reward function that integrates ranking metrics with a distributional reward aimed at aligning score distributions across groups. To overcome the bottleneck caused by the scarcity of high quality labeled data, we further propose an innovative pipeline for synthesizing high quality retrieval and ranking data. The resulting data can be leveraged not only for training the reranker but also for training the retriever. Extensive experiments validate the effectiveness of our approach. On two reasoning intensive retrieval benchmarks, BRIGHT and R2MED.