不要「過度思考」段落重排序：推理真的必要嗎？

摘要

隨著推理模型在複雜自然語言任務中的成功日益顯著，信息檢索（IR）領域的研究者們開始探索如何將類似的推理能力整合到基於大型語言模型（LLMs）的段落重排序器中。這些方法通常利用LLM生成一個明確的、逐步的推理過程，然後得出最終的相關性預測。然而，推理是否真的能提升重排序的準確性？本文深入探討這一問題，通過在相同訓練條件下比較基於推理的逐點重排序器（ReasonRR）與標準的非推理逐點重排序器（StandardRR），發現StandardRR通常優於ReasonRR。基於這一觀察，我們進一步研究了推理對ReasonRR的重要性，通過禁用其推理過程（ReasonRR-NoReason），意外發現ReasonRR-NoReason比ReasonRR更為有效。探究這一結果的原因，我們發現基於推理的重排序器受限於LLM的推理過程，這導致其傾向於極化的相關性評分，從而未能考慮段落的局部相關性，而這正是逐點重排序器準確性的關鍵因素。

English

With the growing success of reasoning models across complex natural language tasks, researchers in the Information Retrieval (IR) community have begun exploring how similar reasoning capabilities can be integrated into passage rerankers built on Large Language Models (LLMs). These methods typically employ an LLM to produce an explicit, step-by-step reasoning process before arriving at a final relevance prediction. But, does reasoning actually improve reranking accuracy? In this paper, we dive deeper into this question, studying the impact of the reasoning process by comparing reasoning-based pointwise rerankers (ReasonRR) to standard, non-reasoning pointwise rerankers (StandardRR) under identical training conditions, and observe that StandardRR generally outperforms ReasonRR. Building on this observation, we then study the importance of reasoning to ReasonRR by disabling its reasoning process (ReasonRR-NoReason), and find that ReasonRR-NoReason is surprisingly more effective than ReasonRR. Examining the cause of this result, our findings reveal that reasoning-based rerankers are limited by the LLM's reasoning process, which pushes it toward polarized relevance scores and thus fails to consider the partial relevance of passages, a key factor for the accuracy of pointwise rerankers.

不要「過度思考」段落重排序：推理真的必要嗎？

Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary?

摘要

Support