토큰에서 액션으로: 정보 검색에서 과도한 사고를 완화하기 위한 상태 머신 추론

초록

Chain-of-Thought(CoT) 프롬프팅은 정보 검색(IR)을 포함한 대규모 언어 모델(LLM)에서 복잡한 추론을 가능하게 합니다. 그러나 이 방법은 종종 과도한 사고(overthinking)를 유발하여, 모델이 의미론적으로 중복된 과도하게 긴 추적을 생성하지만 거의 또는 전혀 이점을 제공하지 못하는 문제가 있습니다. 우리는 IR에서 두 가지 주요 과제를 확인했습니다: 유사한 상태를 반복적으로 방문하는 중복된 궤적과 사용자 의도에서 벗어나는 잘못된 추론입니다. 이를 해결하기 위해 우리는 State Machine Reasoning(SMR)을 제안합니다. SMR은 조기 중단과 세밀한 제어를 지원하는 이산적 행동(Refine, Rerank, Stop)으로 구성된 전환 기반 추론 프레임워크입니다. BEIR 및 BRIGHT 벤치마크에서의 실험 결과, SMR은 검색 성능(nDCG@10)을 3.4% 향상시키면서 토큰 사용량을 74.4% 감소시켰습니다. SMR은 작업별 튜닝 없이도 다양한 LLM과 검색기에 일반화되며, 기존의 CoT 추론에 대한 실용적인 대안을 제공합니다. 코드와 세부 사항은 https://github.com/ldilab/SMR에서 확인할 수 있습니다.

English

Chain-of-Thought (CoT) prompting enables complex reasoning in large language models (LLMs), including applications in information retrieval (IR). However, it often leads to overthinking, where models produce excessively long and semantically redundant traces with little or no benefit. We identify two key challenges in IR: redundant trajectories that revisit similar states and misguided reasoning that diverges from user intent. To address these, we propose State Machine Reasoning (SMR), a transition-based reasoning framework composed of discrete actions (Refine, Rerank, Stop) that support early stopping and fine-grained control. Experiments on the BEIR and BRIGHT benchmarks show that SMR improves retrieval performance (nDCG@10) by 3.4% while reducing token usage by 74.4%. It generalizes across LLMs and retrievers without requiring task-specific tuning, offering a practical alternative to conventional CoT reasoning. The code and details are available at https://github.com/ldilab/SMR.

토큰에서 액션으로: 정보 검색에서 과도한 사고를 완화하기 위한 상태 머신 추론

From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval

초록

Support