别在段落重排序上“过度思考”:推理真的必要吗?
Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?
May 22, 2025
作者: Nour Jedidi, Yung-Sung Chuang, James Glass, Jimmy Lin
cs.AI
摘要
随着推理模型在复杂自然语言任务中的日益成功,信息检索(IR)领域的研究者开始探索如何将类似的推理能力整合到基于大型语言模型(LLM)的段落重排序器中。这些方法通常利用LLM生成一个显式的、逐步的推理过程,最终得出相关性预测。然而,推理是否真的提升了重排序的准确性?本文深入探讨了这一问题,通过对比在相同训练条件下基于推理的点对点重排序器(ReasonRR)与标准的非推理点对点重排序器(StandardRR),发现StandardRR普遍优于ReasonRR。基于这一观察,我们进一步研究了推理对ReasonRR的重要性,通过禁用其推理过程(ReasonRR-NoReason),意外发现ReasonRR-NoReason比ReasonRR更为有效。探究这一结果的原因,我们的发现表明,基于推理的重排序器受限于LLM的推理过程,该过程倾向于产生极化的相关性评分,从而未能考虑段落的局部相关性,而这是点对点重排序器准确性的关键因素。
English
With the growing success of reasoning models across complex natural language
tasks, researchers in the Information Retrieval (IR) community have begun
exploring how similar reasoning capabilities can be integrated into passage
rerankers built on Large Language Models (LLMs). These methods typically employ
an LLM to produce an explicit, step-by-step reasoning process before arriving
at a final relevance prediction. But, does reasoning actually improve reranking
accuracy? In this paper, we dive deeper into this question, studying the impact
of the reasoning process by comparing reasoning-based pointwise rerankers
(ReasonRR) to standard, non-reasoning pointwise rerankers (StandardRR) under
identical training conditions, and observe that StandardRR generally
outperforms ReasonRR. Building on this observation, we then study the
importance of reasoning to ReasonRR by disabling its reasoning process
(ReasonRR-NoReason), and find that ReasonRR-NoReason is surprisingly more
effective than ReasonRR. Examining the cause of this result, our findings
reveal that reasoning-based rerankers are limited by the LLM's reasoning
process, which pushes it toward polarized relevance scores and thus fails to
consider the partial relevance of passages, a key factor for the accuracy of
pointwise rerankers.Summary
AI-Generated Summary