ドリーを探して：テキストから画像への拡散モデルにおける記憶化は想定よりも局所的ではない

要旨

テキストから画像を生成する拡散モデル（DMs）は、画像生成において顕著な成功を収めている。しかし、これらのモデルが訓練データを無意識に記憶し複製する可能性があるため、データプライバシーや知的財産に関する懸念が残っている。最近の緩和策は、記憶が局所化可能であるという仮定に基づき、複製を引き起こす重みを特定し刈り込むことに焦点を当てている。本研究では、これらの刈り込みに基づくアプローチの堅牢性を評価する。刈り込み後であっても、入力プロンプトのテキスト埋め込みにわずかな調整を加えるだけでデータの複製が再び引き起こされることを示し、これらの防御策の脆弱性を明らかにする。さらに、記憶の局所性という根本的な仮定に異議を唱え、テキスト埋め込み空間内の多様な位置から複製が引き起こされ、モデル内で異なる経路をたどることを示す。我々の研究結果は、既存の緩和策が不十分であることを示し、記憶された内容の検索を抑制するのではなく、真に除去する方法の必要性を強調する。この方向性への第一歩として、複製トリガーを反復的に探索し、モデルを更新して堅牢性を高める新たな敵対的ファインチューニング手法を提案する。本研究を通じて、テキストから画像を生成するDMsにおける記憶の性質に関する新たな知見を提供し、より信頼性が高く規制に準拠した生成AIを構築するための基盤を築く。

English

Text-to-image diffusion models (DMs) have achieved remarkable success in image generation. However, concerns about data privacy and intellectual property remain due to their potential to inadvertently memorize and replicate training data. Recent mitigation efforts have focused on identifying and pruning weights responsible for triggering replication, based on the assumption that memorization can be localized. Our research assesses the robustness of these pruning-based approaches. We demonstrate that even after pruning, minor adjustments to text embeddings of input prompts are sufficient to re-trigger data replication, highlighting the fragility of these defenses. Furthermore, we challenge the fundamental assumption of memorization locality, by showing that replication can be triggered from diverse locations within the text embedding space, and follows different paths in the model. Our findings indicate that existing mitigation strategies are insufficient and underscore the need for methods that truly remove memorized content, rather than attempting to suppress its retrieval. As a first step in this direction, we introduce a novel adversarial fine-tuning method that iteratively searches for replication triggers and updates the model to increase robustness. Through our research, we provide fresh insights into the nature of memorization in text-to-image DMs and a foundation for building more trustworthy and compliant generative AI.

ドリーを探して：テキストから画像への拡散モデルにおける記憶化は想定よりも局所的ではない

Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed

要旨

Support