利用矯正隨機微分方程進行語義圖像反轉和編輯

摘要

生成模型將隨機噪音轉換為影像；其反演旨在將影像轉換回結構化噪音以進行恢復和編輯。本文討論兩個關鍵任務：(i) 反演和(ii) 使用修正流模型的隨機等效物（如Flux）對實際影像進行編輯。儘管擴散模型（DMs）最近在圖像生成建模領域佔主導地位，但由於漂移和擴散中的非線性，其反演存在忠實性和可編輯性挑戰。現有的最先進的DM反演方法依賴於額外參數的訓練或潛在變量的測試時間優化；這兩者在實踐中都很昂貴。修正流（RFs）為擴散模型提供了一個有前途的替代方案，然而其反演尚未得到充分探索。我們提出使用線性二次調節器推導的動態最優控制來進行RF反演。我們證明所得到的向量場等效於一個修正的隨機微分方程。此外，我們擴展我們的框架以設計一個Flux的隨機取樣器。我們的反演方法實現了零樣本反演和編輯的最先進性能，在筆劃到圖像合成和語義圖像編輯方面優於先前的工作，大規模的人類評估確認了用戶偏好。

English

Generative models transform random noise into images; their inversion aims to transform images back to structured noise for recovery and editing. This paper addresses two key tasks: (i) inversion and (ii) editing of a real image using stochastic equivalents of rectified flow models (such as Flux). Although Diffusion Models (DMs) have recently dominated the field of generative modeling for images, their inversion presents faithfulness and editability challenges due to nonlinearities in drift and diffusion. Existing state-of-the-art DM inversion approaches rely on training of additional parameters or test-time optimization of latent variables; both are expensive in practice. Rectified Flows (RFs) offer a promising alternative to diffusion models, yet their inversion has been underexplored. We propose RF inversion using dynamic optimal control derived via a linear quadratic regulator. We prove that the resulting vector field is equivalent to a rectified stochastic differential equation. Additionally, we extend our framework to design a stochastic sampler for Flux. Our inversion method allows for state-of-the-art performance in zero-shot inversion and editing, outperforming prior works in stroke-to-image synthesis and semantic image editing, with large-scale human evaluations confirming user preference.

利用矯正隨機微分方程進行語義圖像反轉和編輯

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

摘要

Support