텍스트-이미지 확산 모델을 위한 맞춤형 안전성 정렬

초록

텍스트-이미지 확산 모델은 시각적 콘텐츠 생성에 혁신을 가져왔지만, 현재의 안전 메커니즘은 균일한 기준을 적용하여 종종 개별 사용자 선호도를 반영하지 못하고 있다. 이러한 모델은 연령, 정신 건강, 개인적 신념과 같은 요인에 의해 형성된 다양한 안전 경계를 간과한다. 이를 해결하기 위해, 우리는 생성 모델에서 사용자별 안전 행동을 제어할 수 있는 프레임워크인 개인화된 안전 정렬(Personalized Safety Alignment, PSA)을 제안한다. PSA는 개인화된 사용자 프로필을 확산 과정에 통합하여 이미지 품질을 유지하면서 개별 안전 선호도에 맞게 모델의 행동을 조정한다. 우리는 사용자별 안전 선호도를 포착하고 이러한 프로필을 교차 주의 메커니즘을 통해 통합한 새로운 데이터셋인 Sage를 소개한다. 실험 결과, PSA는 유해 콘텐츠 억제에서 기존 방법을 능가하며 생성된 콘텐츠를 사용자 제약 조건에 더 잘 맞추어 Win Rate 및 Pass Rate 점수에서 더 높은 성과를 달성한다. 우리의 코드, 데이터 및 모델은 https://torpedo2648.github.io/PSAlign/에서 공개되어 있다.

English

Text-to-image diffusion models have revolutionized visual content generation, but current safety mechanisms apply uniform standards that often fail to account for individual user preferences. These models overlook the diverse safety boundaries shaped by factors like age, mental health, and personal beliefs. To address this, we propose Personalized Safety Alignment (PSA), a framework that allows user-specific control over safety behaviors in generative models. PSA integrates personalized user profiles into the diffusion process, adjusting the model's behavior to match individual safety preferences while preserving image quality. We introduce a new dataset, Sage, which captures user-specific safety preferences and incorporates these profiles through a cross-attention mechanism. Experiments show that PSA outperforms existing methods in harmful content suppression and aligns generated content better with user constraints, achieving higher Win Rate and Pass Rate scores. Our code, data, and models are publicly available at https://torpedo2648.github.io/PSAlign/.

텍스트-이미지 확산 모델을 위한 맞춤형 안전성 정렬

Personalized Safety Alignment for Text-to-Image Diffusion Models

초록

Support