ChatPaper.aiChatPaper

模型指示行走:表面启发式如何凌驾于LLM推理中的隐式约束

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

March 30, 2026
作者: Yubo Li, Lu Zhang, Tianchong Jiang, Ramayya Krishnan, Rema Padman
cs.AI

摘要

大型语言模型在处理显性表面线索与未明言的可行性约束相冲突时,会系统性失效。我们通过"诊断-测量-桥接-处理"框架对此展开研究。针对六个模型的"洗车问题"进行因果行为分析,揭示了近似上下文无关的S型启发式规律:距离线索对决策的影响程度是目标因素的8.7至38倍,词元级归因分析显示其模式更符合关键词关联而非组合推理。启发式覆盖基准测试(HOB)——包含4类启发式×5种约束族共500个实例,配备最小对立组与显性度梯度——在14个模型中验证了该现象的普遍性:严格评估标准下(需10/10完全正确),所有模型成功率均未超过75%,存在性约束表现最差(44%)。最小提示(如强调关键对象)平均可提升15个百分点,表明失败根源在于约束推断而非知识缺失;当移除约束条件时,12/14模型表现反而下降(最大降幅39个百分点),揭示出保守偏差。参数化探针证实S型模式可推广至成本、效率及语义相似性启发式;目标分解提示通过强制模型在回答前枚举前提条件,可挽回6-9个百分点的性能损失。这些结果共同将启发式覆盖界定为系统性推理缺陷,并为衡量该问题的解决进展提供了基准尺度。
English
Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the failure lies in constraint inference rather than missing knowledge; 12/14 models perform worse when the constraint is removed (up to -39 pp), revealing conservative bias. Parametric probes confirm that the sigmoid pattern generalizes to cost, efficiency, and semantic-similarity heuristics; goal-decomposition prompting recovers +6 to 9 pp by forcing models to enumerate preconditions before answering. Together, these results characterize heuristic override as a systematic reasoning vulnerability and provide a benchmark for measuring progress toward resolving it.
PDF51April 2, 2026