도리를 찾아서: 텍스트-이미지 확산 모델에서의 기억화는 기존 가정보다 덜 지역적이다

초록

텍스트-이미지 확산 모델(DMs)은 이미지 생성 분야에서 놀라운 성과를 거두었습니다. 그러나 이러한 모델이 훈련 데이터를 의도치 않게 암기하고 복제할 가능성 때문에 데이터 프라이버시와 지적 재산권에 대한 우려가 남아 있습니다. 최근의 완화 노력은 암기 현상이 특정 위치에 국한될 수 있다는 가정에 기반하여, 데이터 복제를 유발하는 가중치를 식별하고 제거하는 데 초점을 맞추고 있습니다. 본 연구는 이러한 가지치기 기반 접근법의 견고성을 평가합니다. 우리는 가지치기를 수행한 후에도 입력 프롬프트의 텍스트 임베딩에 사소한 조정을 가하는 것만으로 데이터 복제가 다시 유발될 수 있음을 보여주며, 이러한 방어 메커니즘의 취약성을 강조합니다. 더 나아가, 우리는 암기 현상의 국한성이라는 근본적인 가정에 도전합니다. 텍스트 임베딩 공간 내 다양한 위치에서 복제가 유발될 수 있으며, 모델 내에서 서로 다른 경로를 따르는 것을 보여줌으로써 이를 입증합니다. 우리의 연구 결과는 기존의 완화 전략이 불충분하며, 암기된 내용의 검출을 억제하려는 시도보다는 이를 진정으로 제거할 수 있는 방법의 필요성을 강조합니다. 이를 위한 첫 번째 단계로, 우리는 복제 유발 요소를 반복적으로 탐색하고 모델을 업데이트하여 견고성을 높이는 새로운 적대적 미세 조정 방법을 소개합니다. 본 연구를 통해 우리는 텍스트-이미지 DMs에서의 암기 현상의 본질에 대한 새로운 통찰을 제공하며, 더 신뢰할 수 있고 규정을 준수하는 생성형 AI를 구축하기 위한 기반을 마련합니다.

English

Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering replication, based on the assumption that memorization can be localized. Our research assesses the robustness of these pruning-based approaches. We demonstrate that even after pruning, minor adjustments to text embeddings of input prompts are sufficient to re-trigger data replication, highlighting the fragility of these defenses. Furthermore, we challenge the fundamental assumption of memorization locality, by showing that replication can be triggered from diverse locations within the text embedding space, and follows different paths in the model. Our findings indicate that existing mitigation strategies are insufficient and underscore the need for methods that truly remove memorized content, rather than attempting to suppress its retrieval. As a first step in this direction, we introduce a novel adversarial fine-tuning method that iteratively searches for replication triggers and updates the model to increase robustness. Through our research, we provide fresh insights into the nature of memorization in text-to-image DMs and a foundation for building more trustworthy and compliant generative AI.

도리를 찾아서: 텍스트-이미지 확산 모델에서의 기억화는 기존 가정보다 덜 지역적이다

Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed

초록

Support