SAEdit:基于稀疏自编码器的连续图像编辑的令牌级控制
SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder
October 6, 2025
作者: Ronen Kamenetsky, Sara Dorfman, Daniel Garibi, Roni Paiss, Or Patashnik, Daniel Cohen-Or
cs.AI
摘要
大规模文本到图像扩散模型已成为现代图像编辑的核心技术,然而仅凭文本提示无法充分掌控编辑过程。其中,两个特性尤为关键:解耦性,即改变某一属性时不会无意中影响其他属性;以及连续性控制,即编辑强度能够平滑调节。我们提出了一种通过对文本嵌入进行词元级操作来实现解耦与连续编辑的方法。编辑通过沿精心选择的方向操控嵌入向量来实施,这些方向控制着目标属性的强度。为识别此类方向,我们采用稀疏自编码器(SAE),其稀疏潜在空间揭示了语义上孤立的维度。我们的方法直接在文本嵌入上操作,无需改动扩散过程,使其与模型无关,并广泛适用于多种图像合成框架。实验表明,该方法能够在多种属性和领域中实现直观且高效的连续控制操作。
English
Large-scale text-to-image diffusion models have become the backbone of modern
image editing, yet text prompts alone do not offer adequate control over the
editing process. Two properties are especially desirable: disentanglement,
where changing one attribute does not unintentionally alter others, and
continuous control, where the strength of an edit can be smoothly adjusted. We
introduce a method for disentangled and continuous editing through token-level
manipulation of text embeddings. The edits are applied by manipulating the
embeddings along carefully chosen directions, which control the strength of the
target attribute. To identify such directions, we employ a Sparse Autoencoder
(SAE), whose sparse latent space exposes semantically isolated dimensions. Our
method operates directly on text embeddings without modifying the diffusion
process, making it model agnostic and broadly applicable to various image
synthesis backbones. Experiments show that it enables intuitive and efficient
manipulations with continuous control across diverse attributes and domains.