過度思考的危險:探討在主動任務中的推理-行動困境
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
February 12, 2025
作者: Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez
cs.AI
摘要
大型推理模型(LRMs)代表了人工智慧解決問題能力的突破,但它們在互動環境中的效果可能受到限制。本文介紹並分析了LRMs中的過度思考現象,即模型偏好延長的內部推理鏈路而非與環境的互動。通過在使用SWE Bench Verified的軟體工程任務上進行實驗,我們觀察到三個反覆出現的模式:分析癱瘓、不當行動和過早脫離。我們提出了一個研究這些行為的框架,與人類專家評估相關聯,並分析了4018個軌跡。我們觀察到,過度思考分數較高與性能下降呈正相關,推理模型比非推理模型更傾向於過度思考。我們的分析顯示,在代理環境中緩解過度思考的簡單努力,例如選擇過度思考分數較低的解決方案,可以將模型性能提高近30%,同時將計算成本降低43%。這些結果表明,緩解過度思考具有重要的實際意義。我們建議通過利用本地函數調用能力和選擇性強化學習,可以緩解過度思考的趨勢。我們還將我們的評估框架和數據集開源,以促進在這個方向上的研究,網址為https://github.com/AlexCuadron/Overthinking。
English
Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving
capabilities, but their effectiveness in interactive environments can be
limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon
where models favor extended internal reasoning chains over environmental
interaction. Through experiments on software engineering tasks using SWE Bench
Verified, we observe three recurring patterns: Analysis Paralysis, Rogue
Actions, and Premature Disengagement. We propose a framework to study these
behaviors, which correlates with human expert assessments, and analyze 4018
trajectories. We observe that higher overthinking scores correlate with
decreased performance, with reasoning models exhibiting stronger tendencies
toward overthinking compared to non-reasoning models. Our analysis reveals that
simple efforts to mitigate overthinking in agentic environments, such as
selecting the solution with the lower overthinking score, can improve model
performance by almost 30% while reducing computational costs by 43%. These
results suggest that mitigating overthinking has strong practical implications.
We suggest that by leveraging native function-calling capabilities and
selective reinforcement learning overthinking tendencies could be mitigated. We
also open-source our evaluation framework and dataset to facilitate research in
this direction at https://github.com/AlexCuadron/Overthinking.Summary
AI-Generated Summary