寻找多莉：文本到图像扩散模型中的记忆化现象比预想的更为广泛

摘要

文本到图像扩散模型（DMs）在图像生成领域取得了显著成就。然而，由于这些模型可能无意间记忆并复制训练数据，引发了关于数据隐私和知识产权的担忧。近期的缓解措施主要集中在识别并剪除触发复制的权重上，其假设是记忆行为可以被局部化。我们的研究评估了这些基于剪枝方法的鲁棒性。我们证明，即便在剪枝之后，对输入提示的文本嵌入进行细微调整也足以重新触发数据复制，这揭示了这些防御措施的脆弱性。此外，我们通过展示复制可以从文本嵌入空间的不同位置触发，并在模型中遵循不同路径，挑战了记忆局部性的基本假设。我们的发现表明，现有的缓解策略尚不充分，并强调了需要真正移除记忆内容的方法，而非仅仅试图抑制其检索。作为这一方向的第一步，我们引入了一种新颖的对抗性微调方法，该方法迭代搜索复制触发器并更新模型以增强鲁棒性。通过研究，我们为理解文本到图像DMs中的记忆本质提供了新见解，并为构建更可信、合规的生成式AI奠定了基础。

English

Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering replication, based on the assumption that memorization can be localized. Our research assesses the robustness of these pruning-based approaches. We demonstrate that even after pruning, minor adjustments to text embeddings of input prompts are sufficient to re-trigger data replication, highlighting the fragility of these defenses. Furthermore, we challenge the fundamental assumption of memorization locality, by showing that replication can be triggered from diverse locations within the text embedding space, and follows different paths in the model. Our findings indicate that existing mitigation strategies are insufficient and underscore the need for methods that truly remove memorized content, rather than attempting to suppress its retrieval. As a first step in this direction, we introduce a novel adversarial fine-tuning method that iteratively searches for replication triggers and updates the model to increase robustness. Through our research, we provide fresh insights into the nature of memorization in text-to-image DMs and a foundation for building more trustworthy and compliant generative AI.

寻找多莉：文本到图像扩散模型中的记忆化现象比预想的更为广泛

Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed

摘要

Support