模型指示行走：表面启发式如何凌驾于LLM推理中的隐式约束

摘要

大型语言模型在显著表面线索与未明述可行性约束冲突时会出现系统性失效。我们通过诊断-测量-桥接-处理框架对此展开研究。针对六个模型的"洗车问题"进行因果行为分析，揭示了近似上下文无关的S型启发式策略：距离线索的影响力是目标因素的8.7至38倍，词元级归因显示其模式更符合关键词关联而非组合推理。启发式覆盖基准（HOB）——包含500个实例，涵盖4种启发式×5类约束族，配备最小对立对和明确度梯度——在14个模型中验证了普适性：严格评估标准下（需全部答对10题），无模型超过75%准确率，存在性约束最难（44%）。最小提示（如强调关键对象）平均可提升15个百分点，表明失败源于约束推断而非知识缺失；移除约束时12/14模型表现更差（最大降幅39个百分点），揭示保守偏差。参数化探针证实S型模式适用于成本、效率和语义相似性启发式；目标分解提示通过强制模型先枚举前提条件，可提升6-9个百分点。这些结果共同表明启发式覆盖是系统性推理缺陷，并为衡量该问题的解决进展提供了基准框架。

English

Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the failure lies in constraint inference rather than missing knowledge; 12/14 models perform worse when the constraint is removed (up to -39 pp), revealing conservative bias. Parametric probes confirm that the sigmoid pattern generalizes to cost, efficiency, and semantic-similarity heuristics; goal-decomposition prompting recovers +6 to 9 pp by forcing models to enumerate preconditions before answering. Together, these results characterize heuristic override as a systematic reasoning vulnerability and provide a benchmark for measuring progress toward resolving it.

模型指示行走：表面启发式如何凌驾于LLM推理中的隐式约束

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

摘要

Support