寻找多莉：文本到图像扩散模型中的记忆化现象比假设更为广泛

摘要

文本到图像扩散模型（DMs）在图像生成领域取得了显著成就。然而，由于这些模型可能无意中记忆并复制训练数据，关于数据隐私和知识产权的担忧依然存在。近期的缓解措施主要集中在识别并剪除触发复制的权重上，这一做法基于记忆可被局部化的假设。我们的研究评估了这些基于剪枝方法的鲁棒性。我们证明，即便在剪枝之后，对输入提示的文本嵌入进行微小调整也足以重新触发数据复制，这凸显了这些防御措施的脆弱性。此外，我们挑战了记忆局部性的基本假设，通过展示复制可以从文本嵌入空间的不同位置被触发，并在模型中遵循不同的路径。我们的发现表明，现有的缓解策略并不充分，并强调了需要真正移除记忆内容的方法，而非仅仅试图抑制其检索。作为这一方向的第一步，我们引入了一种新颖的对抗性微调方法，该方法迭代地搜索复制触发点并更新模型以增强鲁棒性。通过我们的研究，我们为理解文本到图像DMs中的记忆本质提供了新见解，并为构建更可信赖且合规的生成式人工智能奠定了基础。

English

Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering replication, based on the assumption that memorization can be localized. Our research assesses the robustness of these pruning-based approaches. We demonstrate that even after pruning, minor adjustments to text embeddings of input prompts are sufficient to re-trigger data replication, highlighting the fragility of these defenses. Furthermore, we challenge the fundamental assumption of memorization locality, by showing that replication can be triggered from diverse locations within the text embedding space, and follows different paths in the model. Our findings indicate that existing mitigation strategies are insufficient and underscore the need for methods that truly remove memorized content, rather than attempting to suppress its retrieval. As a first step in this direction, we introduce a novel adversarial fine-tuning method that iteratively searches for replication triggers and updates the model to increase robustness. Through our research, we provide fresh insights into the nature of memorization in text-to-image DMs and a foundation for building more trustworthy and compliant generative AI.

寻找多莉：文本到图像扩散模型中的记忆化现象比假设更为广泛

Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed

摘要

Support