雙曲安全感知視覺語言模型

摘要

解決從視覺語言模型（如CLIP）中檢索不安全內容的問題，是實現其現實世界整合的重要一步。當前的研究主要依賴於去學習技術，試圖抹除模型對不安全概念的知識。雖然這種方法有效減少了不期望的輸出，但去學習限制了模型區分安全與不安全內容的能力。在本研究中，我們提出了一種新穎的方法，通過利用雙曲空間固有的層次特性，從去學習轉向意識範式。我們建議將安全與不安全內容編碼為一個蘊含層次結構，將它們置於雙曲空間的不同區域。我們的HySAC（雙曲安全感知CLIP）採用蘊含損失函數來建模安全與不安全圖文對之間的層次和非對稱關係。這種建模在標準視覺語言模型中因依賴歐幾里得嵌入而無效，它賦予模型對不安全內容的意識，使其既能作為多模態不安全分類器，又能作為靈活的內容檢索器，並可選擇將不安全查詢動態重定向至更安全的替代方案或保留原始輸出。大量實驗表明，我們的方法不僅增強了安全識別能力，還為視覺語言模型中的內容審核建立了一個更適應性強且可解釋的框架。我們的源代碼可在https://github.com/aimagelab/HySAC獲取。

English

Addressing the retrieval of unsafe content from vision-language models such as CLIP is an important step towards real-world integration. Current efforts have relied on unlearning techniques that try to erase the model's knowledge of unsafe concepts. While effective in reducing unwanted outputs, unlearning limits the model's capacity to discern between safe and unsafe content. In this work, we introduce a novel approach that shifts from unlearning to an awareness paradigm by leveraging the inherent hierarchical properties of the hyperbolic space. We propose to encode safe and unsafe content as an entailment hierarchy, where both are placed in different regions of hyperbolic space. Our HySAC, Hyperbolic Safety-Aware CLIP, employs entailment loss functions to model the hierarchical and asymmetrical relations between safe and unsafe image-text pairs. This modelling, ineffective in standard vision-language models due to their reliance on Euclidean embeddings, endows the model with awareness of unsafe content, enabling it to serve as both a multimodal unsafe classifier and a flexible content retriever, with the option to dynamically redirect unsafe queries toward safer alternatives or retain the original output. Extensive experiments show that our approach not only enhances safety recognition but also establishes a more adaptable and interpretable framework for content moderation in vision-language models. Our source code is available at https://github.com/aimagelab/HySAC.

雙曲安全感知視覺語言模型

Hyperbolic Safety-Aware Vision-Language Models

摘要

Support