跨模態安全對齊
Cross-Modality Safety Alignment
June 21, 2024
作者: Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang
cs.AI
摘要
隨著人工通用智能(AGI)日益融入人類生活的各個方面,確保這些系統的安全性和道德一致性至關重要。先前的研究主要集中在單模態威脅上,這可能不足以應對跨模態交互作用的整合和複雜性。我們引入了一個新穎的安全一致性挑戰,稱為“安全輸入但不安全輸出”(SIUO),以評估跨模態安全一致性。具體而言,它考慮了單一模態在獨立情況下是安全的,但在結合時可能導致不安全或不道德的輸出的情況。為了從實證角度研究這個問題,我們開發了SIUO,這是一個跨模態基準,包括自我傷害、非法活動和侵犯隱私等9個關鍵安全領域。我們的研究結果顯示,無論是閉源還是開源的LVLMs,如GPT-4V和LLaVA,都存在重大的安全漏洞,突顯了當前模型無法可靠地解釋和應對複雜的現實情境的不足。
English
As Artificial General Intelligence (AGI) becomes increasingly integrated into
various facets of human life, ensuring the safety and ethical alignment of such
systems is paramount. Previous studies primarily focus on single-modality
threats, which may not suffice given the integrated and complex nature of
cross-modality interactions. We introduce a novel safety alignment challenge
called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety
alignment. Specifically, it considers cases where single modalities are safe
independently but could potentially lead to unsafe or unethical outputs when
combined. To empirically investigate this problem, we developed the SIUO, a
cross-modality benchmark encompassing 9 critical safety domains, such as
self-harm, illegal activities, and privacy violations. Our findings reveal
substantial safety vulnerabilities in both closed- and open-source LVLMs, such
as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably
interpret and respond to complex, real-world scenarios.Summary
AI-Generated Summary