当背景信息至关重要:通过可迁移攻击破解医学视觉语言模型
When Background Matters: Breaking Medical Vision Language Models by Transferable Attack
April 19, 2026
作者: Akash Ghosh, Subhadip Baidya, Sriparna Saha, Xiuying Chen
cs.AI
摘要
视觉语言模型在临床诊断中的应用日益广泛,但其对抗攻击的鲁棒性仍未得到充分探索,这构成了严重风险。现有医学攻击主要聚焦于模型窃取或对抗性微调等次要目标,而来自自然图像的可迁移攻击会产生临床医生易察觉的明显失真。为此,我们提出MedFocusLeak——一种高可迁移性的黑盒多模态攻击方法,该方法能在保持扰动不可感知的前提下,诱导模型做出错误但临床可信的诊断。该技术通过向非诊断性背景区域注入协同扰动,并采用注意力分散机制使模型偏离病理区域。在六种医学影像模态上的广泛评估表明,MedFocusLeak实现了最先进的性能,能在不同VLM上生成具有误导性却真实的诊断输出。我们进一步提出包含新颖指标的统一评估框架,可同步捕捉攻击成功率与图像保真度,揭示了现代临床VLM在推理能力上的关键缺陷。
English
Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, yet their robustness to adversarial attacks remains largely unexplored, posing serious risks. Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, while transferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible. The method injects coordinated perturbations into non-diagnostic background regions and employs an attention distraction mechanism to shift the model's focus away from pathological areas. Extensive evaluations across six medical imaging modalities show that MedFocusLeak achieves state-of-the-art performance, generating misleading yet realistic diagnostic outputs across diverse VLMs. We further introduce a unified evaluation framework with novel metrics that jointly capture attack success and image fidelity, revealing a critical weakness in the reasoning capabilities of modern clinical VLMs.