滑块编辑:基于细粒度指令控制的连续图像编辑
SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control
November 12, 2025
作者: Arman Zarei, Samyadeep Basu, Mobina Pournemat, Sayan Nag, Ryan Rossi, Soheil Feizi
cs.AI
摘要
基于指令的图像编辑模型近期取得了显著进展,能够通过多指令提示实现对输入图像的复杂编辑。然而,这些模型通常以固定强度执行提示中的每条指令,限制了用户对单个编辑强度进行精确连续控制的能力。我们提出SliderEdit框架,通过细粒度、可解释的指令控制实现连续图像编辑。该框架在接收到复合编辑指令后,能够解耦各子指令并将其转化为全局训练的滑杆控件,支持通过平滑调节控制编辑强度。与文本生成图像领域中需要为每个属性或概念单独训练滑块控件的方法不同,我们的方法仅需学习一组低秩自适应矩阵,即可泛化至多样化的编辑任务、属性及组合指令。这使得用户能在保持空间局部性和全局语义一致性的同时,沿单个编辑维度进行连续插值。我们将SliderEdit应用于FLUX-Kontext和Qwen-Image-Edit等前沿图像编辑模型,在编辑可控性、视觉一致性和用户导向性方面观察到显著提升。据我们所知,这是首个在基于指令的图像编辑模型中实现连续细粒度指令控制的框架。我们的研究成果为具有连续组合控制能力的交互式指令驱动图像编辑开辟了新路径。
English
Instruction-based image editing models have recently achieved impressive performance, enabling complex edits to an input image from a multi-instruction prompt. However, these models apply each instruction in the prompt with a fixed strength, limiting the user's ability to precisely and continuously control the intensity of individual edits. We introduce SliderEdit, a framework for continuous image editing with fine-grained, interpretable instruction control. Given a multi-part edit instruction, SliderEdit disentangles the individual instructions and exposes each as a globally trained slider, allowing smooth adjustment of its strength. Unlike prior works that introduced slider-based attribute controls in text-to-image generation, typically requiring separate training or fine-tuning for each attribute or concept, our method learns a single set of low-rank adaptation matrices that generalize across diverse edits, attributes, and compositional instructions. This enables continuous interpolation along individual edit dimensions while preserving both spatial locality and global semantic consistency. We apply SliderEdit to state-of-the-art image editing models, including FLUX-Kontext and Qwen-Image-Edit, and observe substantial improvements in edit controllability, visual consistency, and user steerability. To the best of our knowledge, we are the first to explore and propose a framework for continuous, fine-grained instruction control in instruction-based image editing models. Our results pave the way for interactive, instruction-driven image manipulation with continuous and compositional control.