SliderEdit:基于细粒度指令控制的连续图像编辑
SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control
November 12, 2025
作者: Arman Zarei, Samyadeep Basu, Mobina Pournemat, Sayan Nag, Ryan Rossi, Soheil Feizi
cs.AI
摘要
基于指令的图像编辑模型近期取得了显著进展,能够通过多指令提示实现对输入图像的复杂编辑。然而,这些模型通常以固定强度执行提示中的每条指令,限制了用户对单个编辑强度进行精确连续控制的能力。我们提出SliderEdit框架,通过细粒度可解释的指令控制实现连续图像编辑。该框架能将复合编辑指令解耦为独立指令,并将每个指令转化为全局训练的滑动条,支持对其强度进行平滑调节。与文本生成图像领域中需要为每个属性或概念单独训练滑块控件的方法不同,我们的技术仅需学习一组低秩自适应矩阵即可泛化应用于多样化编辑任务、属性及组合指令。这种方法能在保持空间局部性和全局语义一致性的同时,实现沿单个编辑维度的连续插值。我们将SliderEdit应用于FLUX-Kontext和Qwen-Image-Edit等前沿图像编辑模型,在编辑可控性、视觉一致性和用户导向性方面观察到显著提升。据我们所知,这是首次在基于指令的图像编辑模型中探索并实现连续细粒度指令控制框架的研究成果。该成果为具有连续组合控制能力的交互式指令驱动图像处理开辟了新途径。
English
Instruction-based image editing models have recently achieved impressive performance, enabling complex edits to an input image from a multi-instruction prompt. However, these models apply each instruction in the prompt with a fixed strength, limiting the user's ability to precisely and continuously control the intensity of individual edits. We introduce SliderEdit, a framework for continuous image editing with fine-grained, interpretable instruction control. Given a multi-part edit instruction, SliderEdit disentangles the individual instructions and exposes each as a globally trained slider, allowing smooth adjustment of its strength. Unlike prior works that introduced slider-based attribute controls in text-to-image generation, typically requiring separate training or fine-tuning for each attribute or concept, our method learns a single set of low-rank adaptation matrices that generalize across diverse edits, attributes, and compositional instructions. This enables continuous interpolation along individual edit dimensions while preserving both spatial locality and global semantic consistency. We apply SliderEdit to state-of-the-art image editing models, including FLUX-Kontext and Qwen-Image-Edit, and observe substantial improvements in edit controllability, visual consistency, and user steerability. To the best of our knowledge, we are the first to explore and propose a framework for continuous, fine-grained instruction control in instruction-based image editing models. Our results pave the way for interactive, instruction-driven image manipulation with continuous and compositional control.