PhotoDoodle：少数のペアワイズデータからの芸術的画像編集の学習

要旨

本論文では、写真に装飾要素を重ねることでアーティストがフォトドゥードリングを行うことを可能にする新しい画像編集フレームワーク、PhotoDoodleを紹介します。フォトドゥードリングは、挿入された要素が背景とシームレスに統合されている必要があり、リアルなブレンド、視点の整合、文脈的な一貫性が求められるため、挑戦的な作業です。さらに、背景を歪ませずに保持し、限られたトレーニングデータからアーティストの独自のスタイルを効率的に捉える必要があります。これらの要件は、主にグローバルなスタイル転送や領域のインペインティングに焦点を当てた従来の手法では対応されていませんでした。提案手法であるPhotoDoodleは、2段階のトレーニング戦略を採用しています。最初に、大規模なデータを使用して汎用画像編集モデルOmniEditorをトレーニングします。その後、アーティストがキュレートした前後の画像ペアの小さなデータセットを使用してEditLoRAでこのモデルを微調整し、独特の編集スタイルと技術を捉えます。生成結果の一貫性を高めるために、位置エンコーディングの再利用メカニズムを導入します。さらに、6つの高品質なスタイルを特徴とするPhotoDoodleデータセットを公開します。広範な実験により、カスタマイズされた画像編集における本手法の先進的な性能と堅牢性が実証され、芸術的創作の新たな可能性が開かれました。

English

We introduce PhotoDoodle, a novel image editing framework designed to facilitate photo doodling by enabling artists to overlay decorative elements onto photographs. Photo doodling is challenging because the inserted elements must appear seamlessly integrated with the background, requiring realistic blending, perspective alignment, and contextual coherence. Additionally, the background must be preserved without distortion, and the artist's unique style must be captured efficiently from limited training data. These requirements are not addressed by previous methods that primarily focus on global style transfer or regional inpainting. The proposed method, PhotoDoodle, employs a two-stage training strategy. Initially, we train a general-purpose image editing model, OmniEditor, using large-scale data. Subsequently, we fine-tune this model with EditLoRA using a small, artist-curated dataset of before-and-after image pairs to capture distinct editing styles and techniques. To enhance consistency in the generated results, we introduce a positional encoding reuse mechanism. Additionally, we release a PhotoDoodle dataset featuring six high-quality styles. Extensive experiments demonstrate the advanced performance and robustness of our method in customized image editing, opening new possibilities for artistic creation.

PhotoDoodle：少数のペアワイズデータからの芸術的画像編集の学習

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

要旨

Support