ChatPaper.aiChatPaper

双曲安全感知视觉-语言模型

Hyperbolic Safety-Aware Vision-Language Models

March 15, 2025
作者: Tobia Poppi, Tejaswi Kasarla, Pascal Mettes, Lorenzo Baraldi, Rita Cucchiara
cs.AI

摘要

解决从视觉语言模型(如CLIP)中检索不安全内容的问题,是实现其现实世界集成的重要一步。当前的努力主要依赖于遗忘技术,试图抹除模型对不安全概念的知识。尽管遗忘技术在减少不良输出方面有效,但它限制了模型区分安全与不安全内容的能力。在本研究中,我们提出了一种新颖的方法,通过利用双曲空间固有的层次特性,从遗忘转向意识范式。我们建议将安全与不安全内容编码为蕴含层次结构,使二者位于双曲空间的不同区域。我们的HySAC(双曲安全感知CLIP)采用蕴含损失函数来建模安全与不安全图像-文本对之间的层次性和非对称关系。这种建模在标准视觉语言模型中因依赖欧几里得嵌入而效果不佳,却赋予模型对不安全内容的意识,使其既能作为多模态不安全分类器,又能作为灵活的内容检索器,并具备动态将不安全查询重定向至更安全替代选项或保留原始输出的能力。大量实验表明,我们的方法不仅提升了安全识别能力,还为视觉语言模型中的内容审核建立了一个更适应性强且可解释的框架。我们的源代码已发布于https://github.com/aimagelab/HySAC。
English
Addressing the retrieval of unsafe content from vision-language models such as CLIP is an important step towards real-world integration. Current efforts have relied on unlearning techniques that try to erase the model's knowledge of unsafe concepts. While effective in reducing unwanted outputs, unlearning limits the model's capacity to discern between safe and unsafe content. In this work, we introduce a novel approach that shifts from unlearning to an awareness paradigm by leveraging the inherent hierarchical properties of the hyperbolic space. We propose to encode safe and unsafe content as an entailment hierarchy, where both are placed in different regions of hyperbolic space. Our HySAC, Hyperbolic Safety-Aware CLIP, employs entailment loss functions to model the hierarchical and asymmetrical relations between safe and unsafe image-text pairs. This modelling, ineffective in standard vision-language models due to their reliance on Euclidean embeddings, endows the model with awareness of unsafe content, enabling it to serve as both a multimodal unsafe classifier and a flexible content retriever, with the option to dynamically redirect unsafe queries toward safer alternatives or retain the original output. Extensive experiments show that our approach not only enhances safety recognition but also establishes a more adaptable and interpretable framework for content moderation in vision-language models. Our source code is available at https://github.com/aimagelab/HySAC.

Summary

AI-Generated Summary

PDF32March 19, 2025