ChatPaper.aiChatPaper

OmniSafeBench-MM:多模态越狱攻防评估统一基准与工具箱

OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation

December 6, 2025
作者: Xiaojun Jia, Jie Liao, Qi Guo, Teng Ma, Simeng Qin, Ranjie Duan, Tianlin Li, Yihao Huang, Zhitao Zeng, Dongxian Wu, Yiming Li, Wenqi Ren, Xiaochun Cao, Yang Liu
cs.AI

摘要

近期,多模态大语言模型(MLLMs)的发展已实现感知-推理能力的统一,但这些系统仍极易受到越狱攻击的影响,导致安全对齐机制被绕过并诱发有害行为。现有基准如JailBreakV-28K、MM-SafetyBench和HADES虽为多模态漏洞研究提供了重要参考,但普遍存在攻击场景局限、防御评估标准缺失、缺乏统一可复现工具库等问题。为此,我们推出OmniSafeBench-MM——一个面向多模态越狱攻防评估的综合工具库。该工具库整合了13种代表性攻击方法、15种防御策略,以及涵盖9大风险领域与50个细分类别的多样化数据集,并通过协商型、指令型、陈述型三类查询结构还原真实用户意图。除数据覆盖外,该基准建立了三维评估体系:1)危害性评估,采用从低影响个体危害到灾难性社会威胁的多级粒度标准;2)响应与查询意图对齐度;3)回答详细程度,从而实现安全性与实用性的精细化权衡分析。我们在10个开源与8个闭源MLLMs上开展大规模实验,揭示了其对多模态越狱攻击的脆弱性。通过将数据、方法与评估整合为开源可复现平台,OmniSafeBench-MM为未来研究提供了标准化基础。代码已发布于https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM。
English
Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly vulnerable to jailbreak attacks that bypass safety alignment and induce harmful behaviors. Existing benchmarks such as JailBreakV-28K, MM-SafetyBench, and HADES provide valuable insights into multi-modal vulnerabilities, but they typically focus on limited attack scenarios, lack standardized defense evaluation, and offer no unified, reproducible toolbox. To address these gaps, we introduce OmniSafeBench-MM, which is a comprehensive toolbox for multi-modal jailbreak attack-defense evaluation. OmniSafeBench-MM integrates 13 representative attack methods, 15 defense strategies, and a diverse dataset spanning 9 major risk domains and 50 fine-grained categories, structured across consultative, imperative, and declarative inquiry types to reflect realistic user intentions. Beyond data coverage, it establishes a three-dimensional evaluation protocol measuring (1) harmfulness, distinguished by a granular, multi-level scale ranging from low-impact individual harm to catastrophic societal threats, (2) intent alignment between responses and queries, and (3) response detail level, enabling nuanced safety-utility analysis. We conduct extensive experiments on 10 open-source and 8 closed-source MLLMs to reveal their vulnerability to multi-modal jailbreak. By unifying data, methodology, and evaluation into an open-source, reproducible platform, OmniSafeBench-MM provides a standardized foundation for future research. The code is released at https://github.com/jiaxiaojunQAQ/OmniSafeBench-MM.
PDF72December 10, 2025