ChatPaper.aiChatPaper

圖像編輯中的群組相對注意力引導

Group Relative Attention Guidance for Image Editing

October 28, 2025
作者: Xuanpu Zhang, Xuesong Niu, Ruidong Chen, Dan Song, Jianhao Zeng, Penghui Du, Haoxiang Cao, Kai Wu, An-an Liu
cs.AI

摘要

近年來,基於Diffusion-in-Transformer模型的圖像編輯技術發展迅速。然而,現有編輯方法往往缺乏對編輯程度的有效控制,限制了其實現更個性化結果的能力。為解決這一侷限性,我們研究了DiT模型中的MM-Attention機制,觀察到Query和Key令牌共享一個僅與網絡層相關的偏置向量。我們將此偏置解讀為模型固有的編輯行為特徵,而各令牌與對應偏置間的差值則編碼了內容特定的編輯信號。基於此發現,我們提出了群組相對注意力引導(GRAG)方法,通過重新加權不同令牌的差值來調控模型對輸入圖像與編輯指令的關注比例,無需任何調參即可實現對編輯強度的連續細粒度控制。在現有圖像編輯框架上的大量實驗表明,GRAG僅需四行代碼即可集成,並能持續提升編輯質量。與常用的Classifier-Free Guidance相比,GRAG能實現更平滑精準的編輯程度控制。我們的代碼將發佈於https://github.com/little-misfit/GRAG-Image-Editing。
English
Recently, image editing based on Diffusion-in-Transformer models has undergone rapid development. However, existing editing methods often lack effective control over the degree of editing, limiting their ability to achieve more customized results. To address this limitation, we investigate the MM-Attention mechanism within the DiT model and observe that the Query and Key tokens share a bias vector that is only layer-dependent. We interpret this bias as representing the model's inherent editing behavior, while the delta between each token and its corresponding bias encodes the content-specific editing signals. Based on this insight, we propose Group Relative Attention Guidance, a simple yet effective method that reweights the delta values of different tokens to modulate the focus of the model on the input image relative to the editing instruction, enabling continuous and fine-grained control over editing intensity without any tuning. Extensive experiments conducted on existing image editing frameworks demonstrate that GRAG can be integrated with as few as four lines of code, consistently enhancing editing quality. Moreover, compared to the commonly used Classifier-Free Guidance, GRAG achieves smoother and more precise control over the degree of editing. Our code will be released at https://github.com/little-misfit/GRAG-Image-Editing.
PDF251December 1, 2025