少數標記定乾坤:基於熵的視覺語言模型攻擊方法
Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models
December 26, 2025
作者: Mengqi He, Xinyu Tian, Xin Shen, Jinhong Ni, Shu Zou, Zhaoyuan Yang, Jing Zhang
cs.AI
摘要
視覺語言模型(VLM)雖展現卓越性能,卻仍易受對抗攻擊影響。熵值作為模型不確定性的度量指標,與VLM的可靠性密切相關。傳統基於熵的攻擊方法在所有解碼步驟中最大化不確定性,隱含假設每個詞元對生成不穩定性的貢獻均等。然而我們發現,僅需針對自迴歸生成過程中約20%的高熵詞元(即關鍵決策點)進行幹擾,便能顯著主導輸出軌跡的偏移。通過集中對抗擾動於這些關鍵位置,我們在僅使用極小預算的情況下,實現了與全域攻擊相當的語義破壞效果。更重要的是,在多個代表性VLM上的實驗表明,此類選擇性攻擊可將35-49%的正常輸出轉化為有害內容,暴露出更嚴重的安全風險。值得注意的是,這些脆弱的高熵決策點在不同架構的VLM中重複出現,使得跨模型遷移攻擊具備可行性(對未見過的目標模型達成17-26%的有害轉化率)。基於上述發現,我們提出熵庫引導對抗攻擊(EGA)方法,在實現競爭性攻擊成功率(93-95%)的同時保持高有害轉化率,從而揭示當前VLM安全機制的潛在新弱點。
English
Vision-language models (VLMs) achieve remarkable performance but remain vulnerable to adversarial attacks. Entropy, a measure of model uncertainty, is strongly correlated with the reliability of VLM. Prior entropy-based attacks maximize uncertainty at all decoding steps, implicitly assuming that every token contributes equally to generation instability. We show instead that a small fraction (about 20%) of high-entropy tokens, i.e., critical decision points in autoregressive generation, disproportionately governs output trajectories. By concentrating adversarial perturbations on these positions, we achieve semantic degradation comparable to global methods while using substantially smaller budgets. More importantly, across multiple representative VLMs, such selective attacks convert 35-49% of benign outputs into harmful ones, exposing a more critical safety risk. Remarkably, these vulnerable high-entropy forks recur across architecturally diverse VLMs, enabling feasible transferability (17-26% harmful rates on unseen targets). Motivated by these findings, we propose Entropy-bank Guided Adversarial attacks (EGA), which achieves competitive attack success rates (93-95%) alongside high harmful conversion, thereby revealing new weaknesses in current VLM safety mechanisms.