閉鎖型MLLMに対する特徴量最適アライメントを利用した敵対的攻撃

要旨

マルチモーダル大規模言語モデル（MLLMs）は、転移可能な敵対的サンプルに対して依然として脆弱性を抱えています。既存の手法は、通常、敵対的サンプルとターゲットサンプルの間でCLIPの[CLS]トークンなどのグローバル特徴を整列させることでターゲット攻撃を実現しますが、パッチトークンにエンコードされた豊富なローカル情報を見落としがちです。これにより、特にクローズドソースのモデルにおいて、最適な整列が達成されず、転移性が制限されます。この制限を解決するため、我々は特徴最適整列に基づくターゲット転移可能な敵対的攻撃手法、FOA-Attackを提案し、敵対的転移能力を向上させます。具体的には、グローバルレベルでは、コサイン類似度に基づくグローバル特徴損失を導入し、敵対的サンプルとターゲットサンプルの粗粒度特徴を整列させます。ローカルレベルでは、Transformer内の豊富なローカル表現を考慮し、クラスタリング技術を活用してコンパクトなローカルパターンを抽出し、冗長なローカル特徴を軽減します。その後、敵対的サンプルとターゲットサンプルの間のローカル特徴整列を最適輸送（OT）問題として定式化し、ローカルクラスタリング最適輸送損失を提案して、細粒度特徴整列を洗練します。さらに、敵対的サンプル生成中に複数モデルの影響を適応的にバランスさせる動的アンサンブルモデル重み付け戦略を提案し、転移性をさらに向上させます。様々なモデルにわたる広範な実験により、提案手法の優位性が実証され、特にクローズドソースのMLLMsへの転移において、最先端の手法を凌駕する性能を示しています。コードはhttps://github.com/jiaxiaojunQAQ/FOA-Attackで公開されています。

English

Multimodal large language models (MLLMs) remain vulnerable to transferable adversarial examples. While existing methods typically achieve targeted attacks by aligning global features-such as CLIP's [CLS] token-between adversarial and target samples, they often overlook the rich local information encoded in patch tokens. This leads to suboptimal alignment and limited transferability, particularly for closed-source models. To address this limitation, we propose a targeted transferable adversarial attack method based on feature optimal alignment, called FOA-Attack, to improve adversarial transfer capability. Specifically, at the global level, we introduce a global feature loss based on cosine similarity to align the coarse-grained features of adversarial samples with those of target samples. At the local level, given the rich local representations within Transformers, we leverage clustering techniques to extract compact local patterns to alleviate redundant local features. We then formulate local feature alignment between adversarial and target samples as an optimal transport (OT) problem and propose a local clustering optimal transport loss to refine fine-grained feature alignment. Additionally, we propose a dynamic ensemble model weighting strategy to adaptively balance the influence of multiple models during adversarial example generation, thereby further improving transferability. Extensive experiments across various models demonstrate the superiority of the proposed method, outperforming state-of-the-art methods, especially in transferring to closed-source MLLMs. The code is released at https://github.com/jiaxiaojunQAQ/FOA-Attack.

閉鎖型MLLMに対する特徴量最適アライメントを利用した敵対的攻撃

Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

要旨

Support