擴散模型中文本生成的精確參數定位
Precise Parameter Localization for Textual Generation in Diffusion Models
February 14, 2025
作者: Łukasz Staniszewski, Bartosz Cywiński, Franziska Boenisch, Kamil Deja, Adam Dziedzic
cs.AI
摘要
新穎的擴散模型能夠合成與高質量文本相結合的逼真圖像。令人驚訝的是,我們通過注意力激活補丁的展示,表明只有不到1%的擴散模型參數,全部包含在注意力層中,影響了圖像中文本內容的生成。基於這一觀察,我們通過針對擴散模型的交叉和聯合注意力層來提高文本生成效率和性能。我們介紹了幾個從定位負責文本內容生成的層中受益的應用。首先,我們展示了LoRA-based對局部層進行微調,進一步增強了大型擴散模型的一般文本生成能力,同時保留了擴散模型生成的質量和多樣性。然後,我們展示了如何使用局部層來編輯生成圖像中的文本內容。最後,我們將這個想法擴展到實際用例,以無成本的方式防止生成有毒文本。與先前的工作相比,我們的定位方法廣泛適用於各種擴散模型架構,包括U-Net(例如,LDM和SDXL)和基於Transformer的模型(例如,DeepFloyd IF和Stable Diffusion 3),利用各種文本編碼器(例如,從CLIP到像T5這樣的大型語言模型)。項目頁面可在https://t2i-text-loc.github.io/上找到。
English
Novel diffusion models can synthesize photo-realistic images with integrated
high-quality text. Surprisingly, we demonstrate through attention activation
patching that only less than 1% of diffusion models' parameters, all contained
in attention layers, influence the generation of textual content within the
images. Building on this observation, we improve textual generation efficiency
and performance by targeting cross and joint attention layers of diffusion
models. We introduce several applications that benefit from localizing the
layers responsible for textual content generation. We first show that a
LoRA-based fine-tuning solely of the localized layers enhances, even more, the
general text-generation capabilities of large diffusion models while preserving
the quality and diversity of the diffusion models' generations. Then, we
demonstrate how we can use the localized layers to edit textual content in
generated images. Finally, we extend this idea to the practical use case of
preventing the generation of toxic text in a cost-free manner. In contrast to
prior work, our localization approach is broadly applicable across various
diffusion model architectures, including U-Net (e.g., LDM and SDXL) and
transformer-based (e.g., DeepFloyd IF and Stable Diffusion 3), utilizing
diverse text encoders (e.g., from CLIP to the large language models like T5).
Project page available at https://t2i-text-loc.github.io/.Summary
AI-Generated Summary