MortalMATH:评估推理目标与紧急情境之间的冲突
MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts
January 26, 2026
作者: Etienne Lanzeray, Stephane Meilliez, Malo Ruelle, Damien Sileo
cs.AI
摘要
大型语言模型正日益朝着深度推理方向优化,将复杂任务的准确执行置于通用对话能力之上。本研究探讨这种对计算能力的专注是否会造成"隧道视野",导致在危急情境下忽视安全考量。我们推出MortalMATH基准测试,包含150个场景:用户在描述逐渐危及生命的紧急情况(如中风症状、自由落体)时请求代数帮助。研究发现存在显著的行为分化:通用模型(如Llama-3.1)能成功拒绝数学请求以处理危险情境;而专用推理模型(如Qwen-3-32b和GPT-5-nano)往往完全忽略紧急情况,在用户描述濒死状态时仍保持超过95%的任务完成率。更严重的是,推理所需的计算时间会造成危险延迟:在提供任何潜在帮助前可能长达15秒。这些结果表明,训练模型执着追求正确答案的做法,可能会无意中削弱安全部署所需的生存本能。
English
Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation creates a "tunnel vision" that ignores safety in critical situations. We introduce MortalMATH, a benchmark of 150 scenarios where users request algebra help while describing increasingly life-threatening emergencies (e.g., stroke symptoms, freefall). We find a sharp behavioral split: generalist models (like Llama-3.1) successfully refuse the math to address the danger. In contrast, specialized reasoning models (like Qwen-3-32b and GPT-5-nano) often ignore the emergency entirely, maintaining over 95 percent task completion rates while the user describes dying. Furthermore, the computational time required for reasoning introduces dangerous delays: up to 15 seconds before any potential help is offered. These results suggest that training models to relentlessly pursue correct answers may inadvertently unlearn the survival instincts required for safe deployment.