MortalMATH: 推論目標と緊急状況の間の葛藤の評価

要旨

大規模言語モデルは、一般的な会話よりも複雑なタスクの正確な実行を優先する、深い推論能力の最適化が進んでいる。本研究では、この計算への集中が重大な状況における安全性を無視する「視野狭窄」を引き起こすかどうかを検証する。我々は、生命を脅かす緊急事態（脳卒中症状、自由落下など）の深刻度が増す状況を説明しながらユーザーが代数の助けを求める150のシナリオからなるベンチマーク「MortalMATH」を導入した。その結果、行動に顕著な分断が見られた：汎用モデル（Llama-3.1など）は、危険に対処するため数学的支援を拒否することに成功した。対照的に、専門的な推論モデル（Qwen-3-32bやGPT-5-nanoなど）は、緊急事態を完全に無視することが多く、ユーザーが死に瀕する描写をしている間も95％以上のタスク完了率を維持した。さらに、推論に必要な計算時間は危険な遅延をもたらす：潜在的な支援が提供される前に最大15秒もの遅れが生じうる。これらの結果は、モデルを正答を執拗に追求するように訓練することが、安全な展開に必要な生存本能を意図せず捨て去らせる可能性を示唆している。

English

Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation creates a "tunnel vision" that ignores safety in critical situations. We introduce MortalMATH, a benchmark of 150 scenarios where users request algebra help while describing increasingly life-threatening emergencies (e.g., stroke symptoms, freefall). We find a sharp behavioral split: generalist models (like Llama-3.1) successfully refuse the math to address the danger. In contrast, specialized reasoning models (like Qwen-3-32b and GPT-5-nano) often ignore the emergency entirely, maintaining over 95 percent task completion rates while the user describes dying. Furthermore, the computational time required for reasoning introduces dangerous delays: up to 15 seconds before any potential help is offered. These results suggest that training models to relentlessly pursue correct answers may inadvertently unlearn the survival instincts required for safe deployment.

MortalMATH: 推論目標と緊急状況の間の葛藤の評価

MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts

要旨

Support