VLSBench: マルチモーダルセーフティにおけるビジュアルリークの暴露

要旨

マルチモーダル大規模言語モデル（MLLMs）の安全性に関する懸念は、さまざまなアプリケーションで重要な問題となっています。驚くべきことに、以前の研究では、テキストのアンラーニングを使用してMLLMsを整列させることで、画像テキストペアで訓練されたMLLMsと同等の安全性能が得られるという直感に反する現象が示されています。このような直感に反する現象を説明するために、既存のマルチモーダル安全性ベンチマークにおける視覚的安全情報漏洩（VSIL）問題を発見しました。つまり、画像の潜在的に危険で感度の高いコンテンツがテキストクエリで明らかにされています。このようにして、MLLMsはテキストクエリに基づいてこれらの感度の高いテキスト画像クエリを簡単に拒否することができます。しかし、VSILのない画像テキストペアは実世界のシナリオで一般的であり、既存のマルチモーダル安全性ベンチマークでは見落とされています。このため、我々は、画像テキストペアを使用して視覚的安全漏洩を防ぐマルチモーダルビジュアルリークレス安全性ベンチマーク（VLSBench）を構築しました。実験結果は、VLSBenchがオープンソースおよびクローズドソースのMLLMs、LLaVA、Qwen2-VL、Llama3.2-Vision、およびGPT-4oにとって重要な課題を提起していることを示しています。この研究は、VSILを伴うマルチモーダル安全シナリオにおいてはテキストの整列が十分である一方、VSILのないマルチモーダル安全シナリオにはマルチモーダルの整列がより有望な解決策であることを示しています。詳細は、次のURLからコードとデータをご覧ください：http://hxhcreate.github.io/VLSBench

English

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL. Please see our code and data at: http://hxhcreate.github.io/VLSBench

VLSBench: マルチモーダルセーフティにおけるビジュアルリークの暴露

VLSBench: Unveiling Visual Leakage in Multimodal Safety

要旨

Support