LEDITS: DDPM反転とセマンティックガイダンスによる実画像編集

要旨

近年の大規模テキスト誘導拡散モデルは、強力な画像生成能力を提供している。現在、これらの画像をテキストのみを用いて直感的かつ多様な編集を可能にするための重要な取り組みが進められている。しかし、編集技術には元画像の特定の内容を保持するという本質的な性質があるため、生成モデルによる編集は困難であることが証明されている。一方、テキストベースのモデルでは、プロンプトのわずかな変更でさえ全く異なる結果を生じることが頻繁にあり、ユーザーの意図を正確に反映したワンショット生成の達成は極めて困難である。さらに、これらの最先端ツールを用いて実画像を編集するには、事前学習済みモデルの領域に画像を反転（インバージョン）する必要があり、編集品質とレイテンシに影響する別の要因が加わる。本探索的報告では、LEDITSを提案する。これは実画像編集のための軽量な統合アプローチであり、Edit Friendly DDPM逆拡散技術とセマンティックガイダンスを組み合わせることで、セマンティックガイダンスを実画像編集に拡張するとともに、DDPM逆拡散の編集能力も活用する。この手法は、構図やスタイルの変更を含む、微妙な編集から大規模な編集まで多様な編集を実現し、最適化やアーキテクチャの拡張を必要としない。

English

Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.

LEDITS: DDPM反転とセマンティックガイダンスによる実画像編集

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

要旨

Support