ChatPaper.aiChatPaper

激活近似即使在對齊的LLM中也可能導致安全漏洞:全面分析與防禦

Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense

February 2, 2025
作者: Jiawen Zhang, Kejia Chen, Lipeng He, Jian Lou, Dan Li, Zunlei Feng, Mingli Song, Jian Liu, Kui Ren, Xiaohu Yang
cs.AI

摘要

大型語言模型(LLMs)展示了在各個領域中卓越的能力。隨著LLMs不斷演進的能力和擴展的部署場景,由於它們龐大的規模和著名模型系列(如Llama、Gemma和Mistral)中普遍存在的先進但複雜的激活設計,它們的部署挑戰也隨之升級。這些挑戰在資源受限的部署場景中尤為明顯,因此在這些情況下,緩解推論效率瓶頸至關重要。在眾多最近的努力中,激活近似已經成為追求推論效率的一條有前途的途徑,有時被認為在私密推論等應用中是不可或缺的。儘管在最小程度上對效用影響不大地實現了顯著的加速,甚至對於現實世界的部署來說看起來是穩健且實用的,激活近似的安全影響仍不明朗。在這項工作中,我們通過對激活近似進行首次系統性安全評估來填補LLM安全領域的關鍵空白。我們的安全審查涵蓋了三個熱門類別中的七種最先進技術,揭示了十個與安全相關的LLMs中一致的安全降級。
English
Large Language Models (LLMs) have showcased remarkable capabilities across various domains. Accompanying the evolving capabilities and expanding deployment scenarios of LLMs, their deployment challenges escalate due to their sheer scale and the advanced yet complex activation designs prevalent in notable model series, such as Llama, Gemma, and Mistral. These challenges have become particularly pronounced in resource-constrained deployment scenarios, where mitigating inference efficiency bottlenecks is imperative. Among various recent efforts, activation approximation has emerged as a promising avenue for pursuing inference efficiency, sometimes considered indispensable in applications such as private inference. Despite achieving substantial speedups with minimal impact on utility, even appearing sound and practical for real-world deployment, the safety implications of activation approximations remain unclear. In this work, we fill this critical gap in LLM safety by conducting the first systematic safety evaluation of activation approximations. Our safety vetting spans seven sota techniques across three popular categories, revealing consistent safety degradation across ten safety-aligned LLMs.

Summary

AI-Generated Summary

PDF13February 5, 2025