テキストから画像への拡散モデルにおける個人化された安全性の整合

要旨

テキストから画像を生成する拡散モデルは、視覚コンテンツ生成に革命をもたらしましたが、現在の安全性メカニズムは均一な基準を適用しており、しばしば個々のユーザーの嗜好を考慮できていません。これらのモデルは、年齢、メンタルヘルス、個人的な信念などの要因によって形成される多様な安全性の境界を見落としています。この問題に対処するため、我々は「Personalized Safety Alignment（PSA）」を提案します。これは、生成モデルにおける安全性の振る舞いをユーザーごとに制御するフレームワークです。PSAは、拡散プロセスに個別化されたユーザープロファイルを統合し、画像品質を維持しながら、モデルの振る舞いを個々の安全性の嗜好に合わせて調整します。我々は、ユーザー固有の安全性の嗜好を捉えた新しいデータセット「Sage」を導入し、これらのプロファイルをクロスアテンション機構を通じて組み込みます。実験結果は、PSAが有害コンテンツの抑制において既存の手法を上回り、生成されたコンテンツをユーザーの制約により良く適合させ、より高いWin RateおよびPass Rateスコアを達成することを示しています。我々のコード、データ、およびモデルは、https://torpedo2648.github.io/PSAlign/ で公開されています。

English

Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.

テキストから画像への拡散モデルにおける個人化された安全性の整合

Personalized Safety Alignment for Text-to-Image Diffusion Models

要旨

Support