無聲品牌攻擊:針對文本到圖像擴散模型的無觸發數據投毒攻擊
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
March 12, 2025
作者: Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang
cs.AI
摘要
文本到圖像擴散模型在根據文字提示生成高質量內容方面取得了顯著成功。然而,這些模型對公開可用數據的依賴以及微調數據共享的日益增長趨勢,使其特別容易受到數據投毒攻擊的影響。在本研究中,我們提出了一種名為「無聲品牌攻擊」的新型數據投毒方法,該方法能夠操縱文本到圖像擴散模型,使其在沒有任何文字觸發的情況下生成包含特定品牌標誌或符號的圖像。我們發現,當訓練數據中反覆出現某些視覺模式時,模型會自然地在其輸出中重現這些模式,即使提示中並未提及。基於這一發現,我們開發了一種自動化的數據投毒算法,能夠不引人注目地將標誌注入原始圖像中,確保它們自然融合且不易被察覺。在這種被投毒的數據集上訓練的模型能夠生成包含標誌的圖像,而不會降低圖像質量或文字對齊效果。我們在兩個現實場景下對大規模高質量圖像數據集和風格個性化數據集進行了實驗驗證,即使沒有特定的文字觸發,也取得了很高的成功率。通過人類評估和包括標誌檢測在內的定量指標,我們的方法能夠隱蔽地嵌入標誌。
English
Text-to-image diffusion models have achieved remarkable success in generating
high-quality contents from text prompts. However, their reliance on publicly
available data and the growing trend of data sharing for fine-tuning make these
models particularly vulnerable to data poisoning attacks. In this work, we
introduce the Silent Branding Attack, a novel data poisoning method that
manipulates text-to-image diffusion models to generate images containing
specific brand logos or symbols without any text triggers. We find that when
certain visual patterns are repeatedly in the training data, the model learns
to reproduce them naturally in its outputs, even without prompt mentions.
Leveraging this, we develop an automated data poisoning algorithm that
unobtrusively injects logos into original images, ensuring they blend naturally
and remain undetected. Models trained on this poisoned dataset generate images
containing logos without degrading image quality or text alignment. We
experimentally validate our silent branding attack across two realistic
settings on large-scale high-quality image datasets and style personalization
datasets, achieving high success rates even without a specific text trigger.
Human evaluation and quantitative metrics including logo detection show that
our method can stealthily embed logos.Summary
AI-Generated Summary