FlowOpt: 学習不要の編集のための全フロー処理による高速最適化

要旨

拡散モデルとフローマッチングモデルの目覚ましい成功は、制御生成タスクに向けたテスト時適応に関する研究の急増を引き起こしている。画像編集から修復、圧縮、パーソナライゼーションまで多岐にわたる応用例が存在する。しかし、これらのモデルにおけるサンプリング過程の反復的な性質により、プロセス終了時に生成される画像を直接制御するための勾配ベースの最適化を計算的に行うことは非現実的である。この結果、既存手法では通常、各タイムステップを個別に操作する手法が採用されている。本論文ではFlowOptを提案する。これはゼロ次（勾配不要）最適化フレームワークであり、フロー過程全体をブラックボックスとして扱い、モデルへの誤差逆伝播なしにサンプリング経路全体を通じた最適化を可能にする。本手法は高い効率性を有し、ユーザーが中間最適化結果を監視し、必要に応じて早期停止を実行できる。FlowOptのステップサイズに関する十分条件を証明し、これが大域的最適解への収束を保証することを示す。さらに、適切なステップサイズを選択するために、この上限値を経験的に推定する方法を示す。画像編集におけるFlowOptの応用例として、二つのオプションを実証する：（i）反転（所与の画像を生成する初期ノイズの決定）、（ii）編集画像をソース画像に類似させつつターゲットのテキストプロンプトに従うように直接誘導。両ケースにおいて、FlowOptは既存手法とほぼ同数のニューラル関数評価（NFE）を使用しながら、state-of-the-artの結果を達成する。コードと事例はプロジェクトWebページで公開されている。

English

The remarkable success of diffusion and flow-matching models has ignited a surge of works on adapting them at test time for controlled generation tasks. Examples range from image editing to restoration, compression and personalization. However, due to the iterative nature of the sampling process in those models, it is computationally impractical to use gradient-based optimization to directly control the image generated at the end of the process. As a result, existing methods typically resort to manipulating each timestep separately. Here we introduce FlowOpt - a zero-order (gradient-free) optimization framework that treats the entire flow process as a black box, enabling optimization through the whole sampling path without backpropagation through the model. Our method is both highly efficient and allows users to monitor the intermediate optimization results and perform early stopping if desired. We prove a sufficient condition on FlowOpt's step-size, under which convergence to the global optimum is guaranteed. We further show how to empirically estimate this upper bound so as to choose an appropriate step-size. We demonstrate how FlowOpt can be used for image editing, showcasing two options: (i) inversion (determining the initial noise that generates a given image), and (ii) directly steering the edited image to be similar to the source image while conforming to a target text prompt. In both cases, FlowOpt achieves state-of-the-art results while using roughly the same number of neural function evaluations (NFEs) as existing methods. Code and examples are available on the project's webpage.

FlowOpt: 学習不要の編集のための全フロー処理による高速最適化

FlowOpt: Fast Optimization Through Whole Flow Processes for Training-Free Editing

要旨

Support