Delta-Adapter：単一ペア監視によるスケーラブルな模範ベース画像編集

要旨

事例ベースの画像編集は、ソース-ターゲット画像ペアによって定義された変換を新しいクエリ画像に適用する。既存の手法は、ペア・オブ・ペアの教師あり学習パラダイムに依存しており、同じ編集セマンティクスを共有する2組の画像ペアが必要である。この制約により、大規模なトレーニングデータの収集が困難になり、多様な編集タイプへの汎化が制限される。本研究では、単一ペアの教師あり学習で転移可能な編集セマンティクスを学習するDelta-Adapterを提案する。この手法ではテキストガイダンスを必要としない。事例ペアをモデルに直接提示する代わりに、事前学習済みの視覚エンコーダを利用して、2つの画像間の視覚的変換を符号化するセマンティックデルタを抽出する。このセマンティックデルタは、Perceiverベースのアダプタを介して事前学習済み画像編集モデルに注入される。ターゲット画像がモデルに直接可視化されることがないため、予測ターゲットとして機能し、追加の事例ペアを必要とせずに単一ペアの教師あり学習が可能となる。この定式化により、既存の大規模編集データセットをトレーニングに活用できる。さらに、忠実な変換転移を促進するため、生成出力のセマンティック変化と事例ペアから抽出された真値セマンティックデルタを整合させるセマンティックデルタ一貫性損失を導入する。大規模な実験により、Delta-Adapterが既知の編集タスクにおいて4つの強力なベースラインを一貫して上回り、編集精度とコンテンツ一貫性の両方を向上させるとともに、未知の編集タスクに対してもより効果的に汎化することが実証された。コードはhttps://delta-adapter.github.ioで公開予定である。

English

Exemplar-based image editing applies a transformation defined by a source-target image pair to a new query image. Existing methods rely on a pair-of-pairs supervision paradigm, requiring two image pairs sharing the same edit semantics to learn the target transformation. This constraint makes training data difficult to curate at scale and limits generalization across diverse edit types. We propose Delta-Adapter, a method that learns transferable editing semantics under single-pair supervision, requiring no textual guidance. Rather than directly exposing the exemplar pair to the model, we leverage a pre-trained vision encoder to extract a semantic delta that encodes the visual transformation between the two images. This semantic delta is injected into a pre-trained image editing model via a Perceiver-based adapter. Since the target image is never directly visible to the model, it can serve as the prediction target, enabling single-pair supervision without requiring additional exemplar pairs. This formulation allows us to leverage existing large-scale editing datasets for training. To further promote faithful transformation transfer, we introduce a semantic delta consistency loss that aligns the semantic change of the generated output with the ground-truth semantic delta extracted from the exemplar pair. Extensive experiments demonstrate that Delta-Adapter consistently improves both editing accuracy and content consistency over four strong baselines on seen editing tasks, while also generalizing more effectively to unseen editing tasks. Code will be available at https://delta-adapter.github.io.

Delta-Adapter：単一ペア監視によるスケーラブルな模範ベース画像編集

Delta-Adapter: Scalable Exemplar-Based Image Editing with Single-Pair Supervision

要旨

Support