ChatPaper.aiChatPaper

智能的副作用:多模态大模型在多图推理中的安全隐患

The Side Effects of Being Smart: Safety Risks in MLLMs' Multi-Image Reasoning

January 20, 2026
作者: Renmiao Chen, Yida Lu, Shiyao Cui, Xuan Ouyang, Victor Shea-Jay Huang, Shumin Zhang, Chengwei Pan, Han Qiu, Minlie Huang
cs.AI

摘要

随着多模态大语言模型(MLLMs)处理复杂多图像指令的推理能力不断增强,这一进步可能引发新的安全风险。我们通过构建首个专注于多图像推理安全性的基准测试MIR-SafetyBench来研究该问题,该基准包含涵盖9类多图像关系的2,676个测试实例。针对19个MLLMs的大规模评估揭示了一个令人担忧的趋势:具备更先进多图像推理能力的模型在MIR-SafetyBench上反而表现出更高的脆弱性。除攻击成功率外,我们发现许多被标记为安全的回复流于表面,往往源于模型误解或采用回避性、模棱两可的回应。进一步观察表明,不安全生成内容的注意力熵值平均低于安全生成。这一内部特征提示潜在风险:模型可能过度聚焦于任务解决而忽视安全约束。相关代码与数据已开源:https://github.com/thu-coai/MIR-SafetyBench。
English
As Multimodal Large Language Models (MLLMs) acquire stronger reasoning capabilities to handle complex, multi-image instructions, this advancement may pose new safety risks. We study this problem by introducing MIR-SafetyBench, the first benchmark focused on multi-image reasoning safety, which consists of 2,676 instances across a taxonomy of 9 multi-image relations. Our extensive evaluations on 19 MLLMs reveal a troubling trend: models with more advanced multi-image reasoning can be more vulnerable on MIR-SafetyBench. Beyond attack success rates, we find that many responses labeled as safe are superficial, often driven by misunderstanding or evasive, non-committal replies. We further observe that unsafe generations exhibit lower attention entropy than safe ones on average. This internal signature suggests a possible risk that models may over-focus on task solving while neglecting safety constraints. Our code and data are available at https://github.com/thu-coai/MIR-SafetyBench.
PDF21January 28, 2026