コンセプトランセット：構成的表現を用いた画像編集トランスプラント

要旨

拡散モデルは画像編集タスクに広く使用されています。既存の編集手法では、テキスト埋め込み空間やスコア空間において編集方向を設計することで、表現操作の手順を構築することが一般的です。しかし、このような手順には重要な課題があります。編集強度を過大に見積もると視覚的一貫性が損なわれ、過小に見積もると編集タスクが失敗します。特に、各ソース画像は異なる編集強度を必要とする可能性があり、試行錯誤を通じて適切な強度を見つけるのはコストがかかります。この課題に対処するため、我々はConcept Lancet（CoLan）を提案します。これは、拡散ベースの画像編集における原則に基づいた表現操作のためのゼロショット・プラグアンドプレイフレームワークです。推論時には、ソース入力を潜在（テキスト埋め込みまたは拡散スコア）空間において、収集された視覚概念の表現の疎な線形結合として分解します。これにより、各画像における概念の存在を正確に推定し、編集を導くことが可能になります。編集タスク（置換/追加/削除）に基づいて、カスタマイズされた概念移植プロセスを実行し、対応する編集方向を適用します。概念空間を十分にモデル化するために、潜在辞書のための多様な視覚用語やフレーズの記述とシナリオを含む概念表現データセット、CoLan-150Kを構築しました。複数の拡散ベースの画像編集ベースラインでの実験により、CoLanを搭載した手法が編集効果と一貫性保持において最先端の性能を達成することが示されました。

English

Diffusion models are widely used for image editing tasks. Existing editing methods often design a representation manipulation procedure by curating an edit direction in the text embedding or score space. However, such a procedure faces a key challenge: overestimating the edit strength harms visual consistency while underestimating it fails the editing task. Notably, each source image may require a different editing strength, and it is costly to search for an appropriate strength via trial-and-error. To address this challenge, we propose Concept Lancet (CoLan), a zero-shot plug-and-play framework for principled representation manipulation in diffusion-based image editing. At inference time, we decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. This allows us to accurately estimate the presence of concepts in each image, which informs the edit. Based on the editing task (replace/add/remove), we perform a customized concept transplant process to impose the corresponding editing direction. To sufficiently model the concept space, we curate a conceptual representation dataset, CoLan-150K, which contains diverse descriptions and scenarios of visual terms and phrases for the latent dictionary. Experiments on multiple diffusion-based image editing baselines show that methods equipped with CoLan achieve state-of-the-art performance in editing effectiveness and consistency preservation.

コンセプトランセット：構成的表現を用いた画像編集トランスプラント

Concept Lancet: Image Editing with Compositional Representation Transplant

要旨

Support