使用矫正随机微分方程进行语义图像反转和编辑

摘要

生成模型将随机噪声转换为图像；它们的反演旨在将图像转换回结构化噪声以进行恢复和编辑。本文解决了两个关键任务：（i）反演和（ii）使用修正流模型（如Flux）的随机等效物对真实图像进行编辑。尽管扩散模型（DMs）最近在图像生成建模领域占据主导地位，但由于漂移和扩散中的非线性，它们的反演存在忠实性和可编辑性挑战。现有的最先进的DM反演方法依赖于训练额外参数或测试时优化潜变量；在实践中这两者都很昂贵。修正流（RFs）为扩散模型提供了一种有前途的替代方案，但它们的反演尚未得到充分探讨。我们提出使用通过线性二次调节器导出的动态最优控制来进行RF反演。我们证明所得到的矢量场等同于一个修正的随机微分方程。此外，我们扩展我们的框架以设计Flux的随机采样器。我们的反演方法在零样本反演和编辑方面表现出最先进的性能，优于以往在从笔画到图像合成和语义图像编辑方面的工作，并通过大规模人类评估确认用户偏好。

English

Generative models transform random noise into images; their inversion aims to transform images back to structured noise for recovery and editing. This paper addresses two key tasks: (i) inversion and (ii) editing of a real image using stochastic equivalents of rectified flow models (such as Flux). Although Diffusion Models (DMs) have recently dominated the field of generative modeling for images, their inversion presents faithfulness and editability challenges due to nonlinearities in drift and diffusion. Existing state-of-the-art DM inversion approaches rely on training of additional parameters or test-time optimization of latent variables; both are expensive in practice. Rectified Flows (RFs) offer a promising alternative to diffusion models, yet their inversion has been underexplored. We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator. We prove that the resulting vector field is equivalent to a rectified stochastic differential equation. Additionally, we extend our framework to design a stochastic sampler for Flux. Our inversion method allows for state-of-the-art performance in zero-shot inversion and editing, outperforming prior works in stroke-to-image synthesis and semantic image editing, with large-scale human evaluations confirming user preference.

使用矫正随机微分方程进行语义图像反转和编辑

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

摘要

Support