当背景信息举足轻重：通过可迁移攻击破解医学视觉语言模型

摘要

视觉语言模型（VLM）在临床诊断中的应用日益广泛，但其对抗攻击的鲁棒性仍鲜有研究，这构成了严重风险。现有医学攻击主要针对模型窃取或对抗性微调等次要目标，而来自自然图像的可迁移攻击会产生明显失真，易被临床医生察觉。为此，我们提出MedFocusLeak——一种高可迁移的黑盒多模态攻击方法，能在保持扰动不可感知的同时，诱导模型做出错误但临床可信的诊断。该方法通过向非诊断性背景区域注入协同扰动，并采用注意力分散机制使模型偏离病灶区域。在六种医学影像模态上的大规模评估表明，MedFocusLeak实现了最先进的攻击性能，能在不同VLM上生成具有误导性但真实的诊断输出。我们进一步提出带有新型指标的统一评估框架，可同步捕捉攻击成功率与图像保真度，揭示了现代临床VLM在推理能力上的关键缺陷。

English

Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, yet their robustness to adversarial attacks remains largely unexplored, posing serious risks. Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, while transferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible. The method injects coordinated perturbations into non-diagnostic background regions and employs an attention distraction mechanism to shift the model's focus away from pathological areas. Extensive evaluations across six medical imaging modalities show that MedFocusLeak achieves state-of-the-art performance, generating misleading yet realistic diagnostic outputs across diverse VLMs. We further introduce a unified evaluation framework with novel metrics that jointly capture attack success and image fidelity, revealing a critical weakness in the reasoning capabilities of modern clinical VLMs.