VLSBench: Het blootleggen van visuele lekken in multimodale veiligheid

Samenvatting

Veiligheidszorgen van Multimodale grote taalmodellen (MLLM's) zijn geleidelijk een belangrijk probleem geworden in verschillende toepassingen. Verrassend genoeg geven eerdere werken een tegen-intuïtief fenomeen aan dat het gebruik van tekstueel vergeten om MLLM's uit te lijnen vergelijkbare veiligheidsprestaties oplevert als MLLM's die zijn getraind met afbeelding-tekstparen. Om zo'n tegen-intuïtief fenomeen te verklaren, ontdekken we een probleem van visuele veiligheidsinformatielekkage (VSIL) in bestaande multimodale veiligheidsbenchmarks, d.w.z. dat mogelijk risicovolle en gevoelige inhoud in de afbeelding is onthuld in de tekstuele query. Op deze manier kunnen MLLM's gemakkelijk deze gevoelige tekst-afbeeldingsqueries weigeren op basis van tekstuele queries. Afbeelding-tekstparen zonder VSIL zijn echter gebruikelijk in real-world scenario's en worden over het hoofd gezien door bestaande multimodale veiligheidsbenchmarks. Om deze reden construeren we een multimodale visuele lekvrije veiligheidsbenchmark (VLSBench) die visuele veiligheidslekken van afbeelding naar tekstuele query voorkomt met 2,4k afbeelding-tekstparen. Experimentele resultaten geven aan dat VLSBench een aanzienlijke uitdaging vormt voor zowel open-source als gesloten-source MLLM's, waaronder LLaVA, Qwen2-VL, Llama3.2-Vision en GPT-4o. Deze studie toont aan dat tekstuele uitlijning voldoende is voor multimodale veiligheidsscenario's met VSIL, terwijl multimodale uitlijning een veelbelovender oplossing is voor multimodale veiligheidsscenario's zonder VSIL. Zie onze code en data op: http://hxhcreate.github.io/VLSBench

English

Safety concerns of Multimodal large language models (MLLMs) have gradually become an important problem in various applications. Surprisingly, previous works indicate a counter-intuitive phenomenon that using textual unlearning to align MLLMs achieves comparable safety performances with MLLMs trained with image-text pairs. To explain such a counter-intuitive phenomenon, we discover a visual safety information leakage (VSIL) problem in existing multimodal safety benchmarks, i.e., the potentially risky and sensitive content in the image has been revealed in the textual query. In this way, MLLMs can easily refuse these sensitive text-image queries according to textual queries. However, image-text pairs without VSIL are common in real-world scenarios and are overlooked by existing multimodal safety benchmarks. To this end, we construct multimodal visual leakless safety benchmark (VLSBench) preventing visual safety leakage from image to textual query with 2.4k image-text pairs. Experimental results indicate that VLSBench poses a significant challenge to both open-source and close-source MLLMs, including LLaVA, Qwen2-VL, Llama3.2-Vision, and GPT-4o. This study demonstrates that textual alignment is enough for multimodal safety scenarios with VSIL, while multimodal alignment is a more promising solution for multimodal safety scenarios without VSIL. Please see our code and data at: http://hxhcreate.github.io/VLSBench

VLSBench: Het blootleggen van visuele lekken in multimodale veiligheid

VLSBench: Unveiling Visual Leakage in Multimodal Safety

Samenvatting

Support