SAEdit：基于稀疏自编码器的连续图像编辑的令牌级控制

摘要

大规模文本到图像扩散模型已成为现代图像编辑的核心技术，然而仅凭文本提示无法充分掌控编辑过程。其中，两个特性尤为关键：解耦性，即改变某一属性时不会无意中影响其他属性；以及连续性控制，即编辑强度能够平滑调节。我们提出了一种通过对文本嵌入进行词元级操作来实现解耦与连续编辑的方法。编辑通过沿精心选择的方向操控嵌入向量来实施，这些方向控制着目标属性的强度。为识别此类方向，我们采用稀疏自编码器（SAE），其稀疏潜在空间揭示了语义上孤立的维度。我们的方法直接在文本嵌入上操作，无需改动扩散过程，使其与模型无关，并广泛适用于多种图像合成框架。实验表明，该方法能够在多种属性和领域中实现直观且高效的连续控制操作。

English

Large-scale text-to-image diffusion models have become the backbone of modern image editing, yet text prompts alone do not offer adequate control over the editing process. Two properties are especially desirable: disentanglement, where changing one attribute does not unintentionally alter others, and continuous control, where the strength of an edit can be smoothly adjusted. We introduce a method for disentangled and continuous editing through token-level manipulation of text embeddings. The edits are applied by manipulating the embeddings along carefully chosen directions, which control the strength of the target attribute. To identify such directions, we employ a Sparse Autoencoder (SAE), whose sparse latent space exposes semantically isolated dimensions. Our method operates directly on text embeddings without modifying the diffusion process, making it model agnostic and broadly applicable to various image synthesis backbones. Experiments show that it enables intuitive and efficient manipulations with continuous control across diverse attributes and domains.

SAEdit：基于稀疏自编码器的连续图像编辑的令牌级控制

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

摘要

Support