推理模型固执难改：诊断推理模型中的指令覆盖问题

摘要

大型语言模型在应对复杂且冗长的推理任务时展现出了卓越的能力。然而，这些模型常常表现出对熟悉推理模式的过度依赖，我们称这种现象为“推理僵化”。即使用户给出了明确的指令，模型仍频繁忽视这些明确的条件，转而采用习惯性的推理路径，从而导致错误结论。这种行为在数学和逻辑谜题等领域尤为突出，因为在这些领域中，严格遵守特定约束条件至关重要。为了系统性地研究这一在先前工作中鲜有探讨的推理僵化现象，我们引入了一个由专家精心策划的诊断数据集。该数据集包含对现有数学基准（如AIME和MATH500）的特别修改版本，以及特意重新设计、要求偏离常规推理策略的知名谜题。通过这一数据集，我们识别出模型在默认采用固有推理方式时出现的重复污染模式。具体而言，我们将这种污染归类为三种独特模式：(i) 解释过载，(ii) 输入不信任，以及(iii) 部分指令关注，每种模式都导致模型忽视或曲解所提供的指令。我们公开了此诊断数据集，以促进未来在缓解语言模型推理僵化方面的研究。

English

Large language models have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term reasoning rigidity. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, . Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions. We publicly release our diagnostic set to facilitate future research on mitigating reasoning rigidity in language models.

推理模型固执难改：诊断推理模型中的指令覆盖问题

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

摘要

Support