ChatPaper.aiChatPaper

LazyDrag:通过显式对应关系实现多模态扩散变换器上的稳定拖拽编辑

LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence

September 15, 2025
作者: Zixin Yin, Xili Dai, Duomin Wang, Xianfang Zeng, Lionel M. Ni, Gang Yu, Heung-Yeung Shum
cs.AI

摘要

依赖注意力机制进行隐式点匹配已成为基于拖拽编辑的核心瓶颈,导致在弱化反演强度和昂贵的测试时优化(TTO)之间做出根本性妥协。这一妥协严重限制了扩散模型的生成能力,抑制了高保真度的图像修复和文本引导创作。本文提出LazyDrag,首个面向多模态扩散变换器的基于拖拽图像编辑方法,直接消除了对隐式点匹配的依赖。具体而言,我们的方法从用户拖拽输入生成显式对应图,作为增强注意力控制的可靠参考。这一可靠参考首次在基于拖拽的编辑任务中实现了稳定的全强度反演过程,无需TTO,释放了模型的生成潜力。因此,LazyDrag自然地将精确的几何控制与文本引导统一起来,实现了以往难以企及的复杂编辑:如让狗张嘴并修复其内部,生成“网球”等新物体,或针对模糊拖拽,做出上下文感知的调整,如将手移入口袋。此外,LazyDrag支持多轮工作流,可同时进行移动和缩放操作。在DragBench上的评估显示,我们的方法在拖拽准确性和感知质量上均优于基线,这一结果得到了VIEScore和人类评估的验证。LazyDrag不仅确立了新的性能标杆,还为编辑范式开辟了新路径。
English
The reliance on implicit point matching via attention has become a core bottleneck in drag-based editing, resulting in a fundamental compromise on weakened inversion strength and costly test-time optimization (TTO). This compromise severely limits the generative capabilities of diffusion models, suppressing high-fidelity inpainting and text-guided creation. In this paper, we introduce LazyDrag, the first drag-based image editing method for Multi-Modal Diffusion Transformers, which directly eliminates the reliance on implicit point matching. In concrete terms, our method generates an explicit correspondence map from user drag inputs as a reliable reference to boost the attention control. This reliable reference opens the potential for a stable full-strength inversion process, which is the first in the drag-based editing task. It obviates the necessity for TTO and unlocks the generative capability of models. Therefore, LazyDrag naturally unifies precise geometric control with text guidance, enabling complex edits that were previously out of reach: opening the mouth of a dog and inpainting its interior, generating new objects like a ``tennis ball'', or for ambiguous drags, making context-aware changes like moving a hand into a pocket. Additionally, LazyDrag supports multi-round workflows with simultaneous move and scale operations. Evaluated on the DragBench, our method outperforms baselines in drag accuracy and perceptual quality, as validated by VIEScore and human evaluation. LazyDrag not only establishes new state-of-the-art performance, but also paves a new way to editing paradigms.
PDF193September 16, 2025