從令牌到行動：狀態機推理緩解信息檢索中的過度思考

摘要

鏈式思維（CoT）提示法能夠促進大型語言模型（LLMs）進行複雜推理，包括在信息檢索（IR）中的應用。然而，這種方法常常導致過度思考，使得模型生成過長且語義重複的追蹤路徑，而這些路徑幾乎或完全沒有實際益處。我們在信息檢索中識別出兩個關鍵挑戰：重複的軌跡，即模型反覆訪問相似的狀態；以及偏離用戶意圖的誤導性推理。為解決這些問題，我們提出了狀態機推理（SMR），這是一種基於轉移的推理框架，由離散動作（精煉、重排、停止）組成，支持早期停止和細粒度控制。在BEIR和BRIGHT基準測試上的實驗表明，SMR將檢索性能（nDCG@10）提升了3.4%，同時減少了74.4%的令牌使用量。它能夠跨LLMs和檢索器泛化，無需特定任務的調優，為傳統的CoT推理提供了一個實用的替代方案。代碼及詳細信息請訪問https://github.com/ldilab/SMR。

English

Chain-of-Thought (CoT) prompting enables complex reasoning in large language models (LLMs), including applications in information retrieval (IR). However, it often leads to overthinking, where models produce excessively long and semantically redundant traces with little or no benefit. We identify two key challenges in IR: redundant trajectories that revisit similar states and misguided reasoning that diverges from user intent. To address these, we propose State Machine Reasoning (SMR), a transition-based reasoning framework composed of discrete actions (Refine, Rerank, Stop) that support early stopping and fine-grained control. Experiments on the BEIR and BRIGHT benchmarks show that SMR improves retrieval performance (nDCG@10) by 3.4% while reducing token usage by 74.4%. It generalizes across LLMs and retrievers without requiring task-specific tuning, offering a practical alternative to conventional CoT reasoning. The code and details are available at https://github.com/ldilab/SMR.

從令牌到行動：狀態機推理緩解信息檢索中的過度思考

From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval

摘要

Support