双曲的安全意識型視覚言語モデル

要旨

CLIPのような視覚言語モデルから不適切なコンテンツを検索する問題に対処することは、現実世界への統合に向けた重要なステップです。現在の取り組みでは、不適切な概念に関するモデルの知識を消去しようとするアンラーニング技術に依存しています。アンラーニングは望ましくない出力を減らす点では効果的ですが、モデルが適切なコンテンツと不適切なコンテンツを区別する能力を制限してしまいます。本研究では、双曲空間の内在的な階層的特性を活用することで、アンラーニングから認識パラダイムへの転換を図る新たなアプローチを提案します。適切なコンテンツと不適切なコンテンツを包含階層としてエンコードし、双曲空間の異なる領域に配置することを提案します。私たちのHySAC（Hyperbolic Safety-Aware CLIP）は、包含損失関数を用いて、適切な画像-テキストペアと不適切な画像-テキストペアの間の階層的かつ非対称的な関係をモデル化します。このモデル化は、標準的な視覚言語モデルではユークリッド埋め込みに依存しているため効果的ではありませんが、モデルに不適切なコンテンツの認識能力を与え、多モーダルな不適切分類器として機能するとともに、不適切なクエリを動的に安全な代替案にリダイレクトするか、元の出力を保持する柔軟なコンテンツ検索器としても機能します。広範な実験により、私たちのアプローチが安全性の認識を向上させるだけでなく、視覚言語モデルにおけるコンテンツモデレーションのためのより適応性が高く解釈可能なフレームワークを確立することが示されました。ソースコードはhttps://github.com/aimagelab/HySACで公開されています。

English

Addressing the retrieval of unsafe content from vision-language models such as CLIP is an important step towards real-world integration. Current efforts have relied on unlearning techniques that try to erase the model's knowledge of unsafe concepts. While effective in reducing unwanted outputs, unlearning limits the model's capacity to discern between safe and unsafe content. In this work, we introduce a novel approach that shifts from unlearning to an awareness paradigm by leveraging the inherent hierarchical properties of the hyperbolic space. We propose to encode safe and unsafe content as an entailment hierarchy, where both are placed in different regions of hyperbolic space. Our HySAC, Hyperbolic Safety-Aware CLIP, employs entailment loss functions to model the hierarchical and asymmetrical relations between safe and unsafe image-text pairs. This modelling, ineffective in standard vision-language models due to their reliance on Euclidean embeddings, endows the model with awareness of unsafe content, enabling it to serve as both a multimodal unsafe classifier and a flexible content retriever, with the option to dynamically redirect unsafe queries toward safer alternatives or retain the original output. Extensive experiments show that our approach not only enhances safety recognition but also establishes a more adaptable and interpretable framework for content moderation in vision-language models. Our source code is available at https://github.com/aimagelab/HySAC.

双曲的安全意識型視覚言語モデル

Hyperbolic Safety-Aware Vision-Language Models

要旨

Support