InstantDrag:改善基于拖动的图像编辑中的交互性
InstantDrag: Improving Interactivity in Drag-based Image Editing
September 13, 2024
作者: Joonghyuk Shin, Daehyeon Choi, Jaesik Park
cs.AI
摘要
基于拖动的图像编辑近来因其互动性和精度而备受青睐。然而,尽管文本到图像模型能够在一秒内生成样本,但由于准确反映用户交互并保持图像内容的挑战,拖动编辑仍然落后。一些现有方法依赖于计算密集型的每幅图像优化或复杂的基于引导的方法,需要额外的输入,如可移动区域的蒙版和文本提示,从而损害了编辑过程的互动性。我们引入InstantDrag,这是一个无需优化的流程,可增强互动性和速度,只需一张图像和一个拖动指令作为输入。InstantDrag包括两个精心设计的网络:一个拖动条件的光流生成器(FlowGen)和一个光流条件的扩散模型(FlowDiffusion)。InstantDrag通过将任务分解为运动生成和运动条件图像生成,从真实世界视频数据集中学习基于拖动的图像编辑的运动动态。我们通过对面部视频数据集和一般场景的实验展示了InstantDrag在没有蒙版或文本提示的情况下执行快速、逼真的编辑的能力。这些结果突显了我们方法在处理基于拖动的图像编辑方面的效率,使其成为互动、实时应用的一个有前途的解决方案。
English
Drag-based image editing has recently gained popularity for its interactivity
and precision. However, despite the ability of text-to-image models to generate
samples within a second, drag editing still lags behind due to the challenge of
accurately reflecting user interaction while maintaining image content. Some
existing approaches rely on computationally intensive per-image optimization or
intricate guidance-based methods, requiring additional inputs such as masks for
movable regions and text prompts, thereby compromising the interactivity of the
editing process. We introduce InstantDrag, an optimization-free pipeline that
enhances interactivity and speed, requiring only an image and a drag
instruction as input. InstantDrag consists of two carefully designed networks:
a drag-conditioned optical flow generator (FlowGen) and an optical
flow-conditioned diffusion model (FlowDiffusion). InstantDrag learns motion
dynamics for drag-based image editing in real-world video datasets by
decomposing the task into motion generation and motion-conditioned image
generation. We demonstrate InstantDrag's capability to perform fast,
photo-realistic edits without masks or text prompts through experiments on
facial video datasets and general scenes. These results highlight the
efficiency of our approach in handling drag-based image editing, making it a
promising solution for interactive, real-time applications.Summary
AI-Generated Summary