文本到图像扩散模型的个性化安全对齐

摘要

文本到图像扩散模型已彻底革新了视觉内容生成领域，然而现行的安全机制采用统一标准，往往未能充分考虑个体用户的偏好。这些模型忽视了由年龄、心理健康及个人信仰等因素塑造的多样化安全边界。为此，我们提出了个性化安全对齐（Personalized Safety Alignment, PSA）框架，该框架允许用户在生成模型中对安全行为进行个性化控制。PSA将个性化用户档案融入扩散过程，调整模型行为以契合个体安全偏好，同时保持图像质量。我们引入了一个新数据集Sage，该数据集捕捉用户特定的安全偏好，并通过交叉注意力机制整合这些档案。实验表明，PSA在有害内容抑制方面优于现有方法，并能更好地使生成内容与用户约束保持一致，实现了更高的胜率（Win Rate）和通过率（Pass Rate）得分。我们的代码、数据及模型已公开于https://torpedo2648.github.io/PSAlign/。

English

Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.

文本到图像扩散模型的个性化安全对齐

Personalized Safety Alignment for Text-to-Image Diffusion Models

摘要

Support