MirrorPPR: 事例ベースのポートレート写真レタッチ

要旨

テキストガイドによる画像編集は目覚ましい進歩を遂げているが、構造的な肖像写真のレタッチにおいては依然として限界がある。テキストによる記述では、顔の特徴や身体のプロポーションに対する微細な変更を伝えることが難しい。このギャップに対処するため、我々は「Exemplar-Based Portrait Photo Retouching（事例ベースの肖像写真レタッチ）」を導入する。モデルは事例ペアを与えられ、同じレタッチ操作を新しいクエリ画像に推論して適用するタスクを課される。既存の事例ベースの編集手法は主に、顕著な視覚的変化を伴うタスクに焦点を当てている。対照的に、構造的な肖像写真レタッチは非常に繊細で局所的な修正を伴い、これらの編集の正確な抽出と転送が困難である。これに取り組むため、我々はMirrorPPRという新しいフレームワークを提案する。これは微妙な構造的レタッチ操作を捕捉し転送するように設計されている。本手法は、レタッチ操作抽出器（Retouching Operation Extractor）を用いて事例ペアの微妙な差異を捕捉する。抽出された表現は、コネクタとLow-Rank Adaptation（LoRA）モジュールを介して、事前学習済みのDiffusion Transformer（DiT）に注入される。さらに、完全に位置合わせされたクロスアイデンティティの訓練ペアを構築することは、操作のミスアライメントによって深刻に妨げられる。これを克服するため、我々は厳密に位置合わせされたレタッチ操作を保証する高度なデータ自己増強パラダイムを提案する。データ不足を緩和しこの新規タスクを支援するため、我々は4700万以上のレタッチペアを含む大規模データセットMirrorPPR47Mを導入する。データセットをシミュレーションサブセットとプロフェッショナルサブセットに構造化することで、段階的カリキュラム学習を可能にし、ネットワークをスムーズに最適化する。広範な実験により、MirrorPPRがレタッチ品質と同一性保持の両方において既存のベースラインを大幅に上回ることが示された。プロジェクトページは https://sjtu-deng-lab.github.io/MirrorPPR で公開されている。

English

While text-guided image editing has made remarkable progress, it remains limited in structural portrait retouching. Textual descriptions struggle to convey fine-grained changes to facial features and body proportions. To address this gap, we introduce Exemplar-Based Portrait Photo Retouching, where the model is given an exemplar pair and tasked with inferring and applying the same retouching operations to a new query image. Existing exemplar-based editing methods primarily focus on tasks with pronounced visual transformations. In contrast, structural portrait retouching involves extremely delicate and localized modifications, making accurate extraction and transfer of these edits challenging. To tackle this, we propose MirrorPPR, a novel framework designed to capture and transfer subtle structural retouching operations. Our method uses a Retouching Operation Extractor to capture the subtle differences from the exemplar pair. The extracted representations are then injected into a pre-trained Diffusion Transformer (DiT) through a connector and Low-Rank Adaptation (LoRA) modules. Furthermore, constructing perfectly aligned cross-identity training pairs is severely hindered by operation misalignment. To overcome this, we propose an advanced data self-augmentation paradigm that ensures strictly aligned retouching operations. To alleviate data scarcity and support this novel task, we introduce MirrorPPR47M, a large-scale dataset with over 47 million retouched pairs. By structuring the dataset into simulated and professional subsets, we enable progressive curriculum learning to smoothly optimize the network. Extensive experiments demonstrate that MirrorPPR significantly outperforms existing baselines in both retouching quality and identity preservation. The project page is available at https://sjtu-deng-lab.github.io/MirrorPPR.