LimRank:少即是多——面向推理密集型信息重排序的新策略
LimRank: Less is More for Reasoning-Intensive Information Reranking
October 27, 2025
作者: Tingyu Song, Yilun Zhao, Siyue Zhang, Chen Zhao, Arman Cohan
cs.AI
摘要
现有方法通常依赖大规模微调来使大语言模型适应信息重排序任务,计算成本高昂。本研究证明,现代大语言模型仅需少量高质量监督数据即可有效适配。为此,我们设计了可复用、开源的LIMRANK-SYNTHESIZER流程,用于生成多样化、高难度且贴近实际的重排序样本。基于此合成数据,我们微调出重排序模型LIMRANK。在BRIGHT(推理密集型检索)和FollowIR(指令遵循检索)两个高难度基准测试中,LIMRANK仅使用前人研究不足5%的训练数据就实现了具有竞争力的性能。进一步的消融实验验证了LIMRANK-SYNTHESIZER的有效性,并证明LIMRANK在科学文献检索和面向知识密集型问题解决的检索增强生成等下游任务中具备强大的泛化能力。
English
Existing approaches typically rely on large-scale fine-tuning to adapt LLMs
for information reranking tasks, which is computationally expensive. In this
work, we demonstrate that modern LLMs can be effectively adapted using only
minimal, high-quality supervision. To enable this, we design
LIMRANK-SYNTHESIZER, a reusable and open-source pipeline for generating
diverse, challenging, and realistic reranking examples. Using this synthetic
data, we fine-tune our reranker model, LIMRANK. We evaluate LIMRANK on two
challenging benchmarks, i.e., BRIGHT for reasoning-intensive retrieval and
FollowIR for instruction-following retrieval. Our experiments demonstrate that
LIMRANK achieves competitive performance, while being trained on less than 5%
of the data typically used in prior work. Further ablation studies demonstrate
the effectiveness of LIMRANK-SYNTHESIZER and the strong generalization
capabilities of LIMRANK across downstream tasks, including scientific
literature search and retrieval-augmented generation for knowledge-intensive
problem solving.