ChatPaper.aiChatPaper

推理模型固執難改:診斷推理模型中的指令覆寫問題

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

May 22, 2025
作者: Doohyuk Jang, Yoonjeon Kim, Chanjae Park, Hyun Ryu, Eunho Yang
cs.AI

摘要

大型語言模型在處理冗長且複雜的推理任務上展現了卓越的能力。然而,它們往往過度依賴於熟悉的推理模式,這一現象我們稱之為推理僵化。即便用戶給出了明確的指令,這些模型仍經常無視已清楚說明的條件,轉而默認採用慣常的推理路徑,從而導致錯誤的結論。這種行為在數學和邏輯謎題等領域尤其構成重大挑戰,因為這些領域嚴格遵循指定的約束條件至關重要。為了系統性地研究這一在先前工作中鮮少探討的推理僵化現象,我們引入了一套由專家精心策劃的診斷集。該數據集包含了對現有數學基準(如AIME和MATH500)的特別修改版本,以及特意重新設計、要求偏離熟悉推理策略的知名謎題。利用此數據集,我們識別出模型在默認採用根深蒂固的推理方式時出現的污染模式。具體而言,我們將這種污染歸類為三種獨特模式:(i) 解釋過載,(ii) 輸入不信任,以及(iii) 部分指令關注,每一種模式都導致模型忽視或扭曲所提供的指令。我們公開釋出這套診斷集,以促進未來關於減輕語言模型推理僵化的研究。
English
Large language models have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term reasoning rigidity. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, . Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions. We publicly release our diagnostic set to facilitate future research on mitigating reasoning rigidity in language models.

Summary

AI-Generated Summary

PDF592May 26, 2025