SAEdit：基於稀疏自編碼器的連續圖像編輯之令牌級控制

摘要

大规模文本到图像的扩散模型已成为现代图像编辑的核心技术，然而仅凭文本提示无法为编辑过程提供足够的控制。其中两个特性尤为关键：解耦性，即改变某一属性时不会无意中影响其他属性；以及连续性控制，即能够平滑调整编辑的强度。我们提出了一种通过对文本嵌入进行令牌级操作来实现解耦与连续编辑的方法。编辑通过沿精心选择的方向操控嵌入向量来实施，这些方向控制着目标属性的强度。为了识别此类方向，我们采用了稀疏自编码器（Sparse Autoencoder, SAE），其稀疏潜在空间揭示了语义上孤立的维度。我们的方法直接在文本嵌入上操作，无需修改扩散过程，使其模型无关且广泛适用于多种图像合成框架。实验表明，该方法能够在多种属性和领域中实现直观高效的连续控制操作。

English

Large-scale text-to-image diffusion models have become the backbone of modern image editing, yet text prompts alone do not offer adequate control over the editing process. Two properties are especially desirable: disentanglement, where changing one attribute does not unintentionally alter others, and continuous control, where the strength of an edit can be smoothly adjusted. We introduce a method for disentangled and continuous editing through token-level manipulation of text embeddings. The edits are applied by manipulating the embeddings along carefully chosen directions, which control the strength of the target attribute. To identify such directions, we employ a Sparse Autoencoder (SAE), whose sparse latent space exposes semantically isolated dimensions. Our method operates directly on text embeddings without modifying the diffusion process, making it model agnostic and broadly applicable to various image synthesis backbones. Experiments show that it enables intuitive and efficient manipulations with continuous control across diverse attributes and domains.

SAEdit：基於稀疏自編碼器的連續圖像編輯之令牌級控制

SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder

摘要

Support