文本到图像扩散模型的个性化安全对齐
Personalized Safety Alignment for Text-to-Image Diffusion Models
August 2, 2025
作者: Yu Lei, Jinbin Bai, Qingyu Shi, Aosong Feng, Kaidong Yu
cs.AI
摘要
文本到图像扩散模型已彻底革新了视觉内容生成领域,然而现行的安全机制采用统一标准,往往未能充分考虑个体用户的偏好。这些模型忽视了由年龄、心理健康及个人信仰等因素塑造的多样化安全边界。为此,我们提出了个性化安全对齐(Personalized Safety Alignment, PSA)框架,该框架允许用户在生成模型中对安全行为进行个性化控制。PSA将个性化用户档案融入扩散过程,调整模型行为以契合个体安全偏好,同时保持图像质量。我们引入了一个新数据集Sage,该数据集捕捉用户特定的安全偏好,并通过交叉注意力机制整合这些档案。实验表明,PSA在有害内容抑制方面优于现有方法,并能更好地使生成内容与用户约束保持一致,实现了更高的胜率(Win Rate)和通过率(Pass Rate)得分。我们的代码、数据及模型已公开于https://torpedo2648.github.io/PSAlign/。
English
Text-to-image diffusion models have revolutionized visual content generation,
but current safety mechanisms apply uniform standards that often fail to
account for individual user preferences. These models overlook the diverse
safety boundaries shaped by factors like age, mental health, and personal
beliefs. To address this, we propose Personalized Safety Alignment (PSA), a
framework that allows user-specific control over safety behaviors in generative
models. PSA integrates personalized user profiles into the diffusion process,
adjusting the model's behavior to match individual safety preferences while
preserving image quality. We introduce a new dataset, Sage, which captures
user-specific safety preferences and incorporates these profiles through a
cross-attention mechanism. Experiments show that PSA outperforms existing
methods in harmful content suppression and aligns generated content better with
user constraints, achieving higher Win Rate and Pass Rate scores. Our code,
data, and models are publicly available at
https://torpedo2648.github.io/PSAlign/.