文本到圖像擴散模型的個性化安全對齊

摘要

文本到圖像擴散模型已徹底革新了視覺內容的生成領域，然而現行的安全機制採用統一標準，往往未能充分考慮個別使用者的偏好。這些模型忽視了由年齡、心理健康及個人信仰等因素塑造的多樣化安全邊界。為此，我們提出了個性化安全對齊（Personalized Safety Alignment, PSA）框架，該框架允許使用者對生成模型中的安全行為進行個性化控制。PSA將個性化的使用者檔案整合至擴散過程中，調整模型行為以匹配個體的安全偏好，同時保持圖像質量。我們引入了一個新數據集Sage，該數據集捕捉了使用者特定的安全偏好，並通過交叉注意力機制將這些檔案融入模型。實驗結果表明，PSA在有害內容抑制方面優於現有方法，並使生成內容更好地符合使用者約束，實現了更高的勝率（Win Rate）和通過率（Pass Rate）分數。我們的代碼、數據及模型已公開於https://torpedo2648.github.io/PSAlign/。

English

Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.

文本到圖像擴散模型的個性化安全對齊

Personalized Safety Alignment for Text-to-Image Diffusion Models

摘要

Support