CAPTAIN:面向文本到图像扩散模型记忆缓解的语义特征注入方法
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
December 11, 2025
作者: Tong Zhang, Carlos Hinojosa, Bernard Ghanem
cs.AI
摘要
扩散模型可能無意中重現訓練樣本,隨著這類系統被大規模部署,引發了隱私和版權方面的擔憂。現有的推理階段緩解方法通常通過操控無分類器引導或擾動提示嵌入來實現,但這些方法往往難以在降低記憶效應的同時保持與條件提示的契合度。我們提出CAPTAIN這一無需重新訓練的框架,通過在去噪過程中直接修改潛在特徵來緩解記憶問題。該框架首先應用基於頻率的噪聲初始化,以降低去噪過程早期複現記憶模式的傾向;隨後識別特徵注入的最佳去噪時間步並定位記憶區域;最後將來自非記憶參考圖像的語義契合特徵注入定位的潛在區域,在抑制記憶效應的同時保持提示契合度與視覺質量。實驗表明,相較於基於無分類器引導的基準方法,CAPTAIN在保持與目標提示強關聯性的同時,實現了記憶效應的顯著降低。
English
Diffusion models can unintentionally reproduce training examples, raising privacy and copyright concerns as these systems are increasingly deployed at scale. Existing inference-time mitigation methods typically manipulate classifier-free guidance (CFG) or perturb prompt embeddings; however, they often struggle to reduce memorization without compromising alignment with the conditioning prompt. We introduce CAPTAIN, a training-free framework that mitigates memorization by directly modifying latent features during denoising. CAPTAIN first applies frequency-based noise initialization to reduce the tendency to replicate memorized patterns early in the denoising process. It then identifies the optimal denoising timesteps for feature injection and localizes memorized regions. Finally, CAPTAIN injects semantically aligned features from non-memorized reference images into localized latent regions, suppressing memorization while preserving prompt fidelity and visual quality. Our experiments show that CAPTAIN achieves substantial reductions in memorization compared to CFG-based baselines while maintaining strong alignment with the intended prompt.