凡人数学:评估推理目标与紧急情境间的冲突
MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts
January 26, 2026
作者: Etienne Lanzeray, Stephane Meilliez, Malo Ruelle, Damien Sileo
cs.AI
摘要
大型语言模型正日益针对深度推理进行优化,将复杂任务的正确执行置于通用对话能力之上。我们研究这种对计算能力的专注是否会造成"隧道视野",在危急情境下忽视安全考量。我们推出MortalMATH基准测试,包含150个场景:用户在描述逐渐危及生命的紧急情况(如中风症状、自由落体)时请求代数帮助。研究发现存在显著的行为分化:通用模型(如Llama-3.1)能成功拒绝数学请求以处理危险;而专用推理模型(如Qwen-3-32b和GPT-5-nano)往往完全忽略紧急情况,在用户描述濒死状态时仍保持超过95%的任务完成率。更严重的是,推理所需的计算时间会导致危险延迟:在提供任何潜在帮助前耗时长达15秒。这些结果表明,训练模型不懈追求正确答案的做法,可能会在无意中削弱安全部署所需的生存本能。
English
Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation creates a "tunnel vision" that ignores safety in critical situations. We introduce MortalMATH, a benchmark of 150 scenarios where users request algebra help while describing increasingly life-threatening emergencies (e.g., stroke symptoms, freefall). We find a sharp behavioral split: generalist models (like Llama-3.1) successfully refuse the math to address the danger. In contrast, specialized reasoning models (like Qwen-3-32b and GPT-5-nano) often ignore the emergency entirely, maintaining over 95 percent task completion rates while the user describes dying. Furthermore, the computational time required for reasoning introduces dangerous delays: up to 15 seconds before any potential help is offered. These results suggest that training models to relentlessly pursue correct answers may inadvertently unlearn the survival instincts required for safe deployment.