背景が重要になるとき：転送可能な攻撃による医療視覚言語モデルの破綻

要旨

Vision-Language Models（VLM）は臨床診断においてますます利用が進んでいるが、その敵対的攻撃に対する頑健性はほとんど検証されておらず、重大なリスクをもたらしている。既存の医療分野における攻撃手法は、モデル窃取や敵対的ファインチューニングといった二次的な目的に焦点を当てたものが多く、自然画像からの転移可能な攻撃は臨床医が容易に検知可能な目立つ歪みを生じさせる。この問題に対処するため、我々はMedFocusLeakを提案する。これは、摂動を知覚不能に保ちつつ、誤った、しかし臨床的に妥当な診断を誘導する、高い転移性を持つブラックボックス型マルチモーダル攻撃手法である。本手法は、非診断的な背景領域に調整された摂動を注入し、注意散逸メカニズムを採用することでモデルの焦点を病変領域から逸らす。6つの医療画像モダリティにわたる広範な評価により、MedFocusLeakが最先端の性能を達成し、多様なVLMに対して誤解を招くが現実的な診断結果を生成することを示す。さらに、攻撃成功率と画像の忠実度を統合的に評価する新規指標を含む統一評価フレームワークを導入し、現代の臨床VLMの推論能力における重大な弱点を明らかにする。

English

Vision-Language Models (VLMs) are increasingly used in clinical diagnostics, yet their robustness to adversarial attacks remains largely unexplored, posing serious risks. Existing medical attacks focus on secondary objectives such as model stealing or adversarial fine-tuning, while transferable attacks from natural images introduce visible distortions that clinicians can easily detect. To address this, we propose MedFocusLeak, a highly transferable black-box multimodal attack that induces incorrect yet clinically plausible diagnoses while keeping perturbations imperceptible. The method injects coordinated perturbations into non-diagnostic background regions and employs an attention distraction mechanism to shift the model's focus away from pathological areas. Extensive evaluations across six medical imaging modalities show that MedFocusLeak achieves state-of-the-art performance, generating misleading yet realistic diagnostic outputs across diverse VLMs. We further introduce a unified evaluation framework with novel metrics that jointly capture attack success and image fidelity, revealing a critical weakness in the reasoning capabilities of modern clinical VLMs.

背景が重要になるとき：転送可能な攻撃による医療視覚言語モデルの破綻

When Background Matters: Breaking Medical Vision Language Models by Transferable Attack

要旨

Support