DreamID：三重項IDグループ学習による高忠実度かつ高速な拡散モデルベースの顔交換

要旨

本論文では、高いID類似性、属性保存、画像忠実度、そして高速な推論速度を実現する拡散モデルベースの顔交換モデル「DreamID」を紹介します。従来の顔交換トレーニングプロセスは、暗黙的な監視に依存しがちで、満足のいく結果を得ることが難しい状況でした。DreamIDは、Triplet ID Groupデータを構築することで顔交換に対する明示的な監視を確立し、ID類似性と属性保存を大幅に向上させます。拡散モデルの反復的な性質は、効率的な画像空間損失関数の利用に課題を投げかけます。なぜなら、トレーニング中に生成画像を得るために時間のかかる多段階サンプリングを実行することは非現実的だからです。この問題に対処するため、我々は高速化された拡散モデルSD Turboを活用し、推論ステップを単一の反復に削減することで、明示的なTriplet ID Group監視を用いた効率的なピクセルレベルのエンドツーエンドトレーニングを可能にします。さらに、SwapNet、FaceNet、ID Adapterから構成される改良された拡散ベースのモデルアーキテクチャを提案します。この堅牢なアーキテクチャは、Triplet ID Groupの明示的監視の力を最大限に引き出します。最後に、我々の手法をさらに拡張するため、トレーニング中にTriplet ID Groupデータを明示的に修正し、眼鏡や顔の形などの特定の属性を微調整して保存します。大規模な実験により、DreamIDがID類似性、ポーズと表情の保存、画像忠実度の点で最先端の手法を凌駕することが実証されました。全体として、DreamIDは512*512解像度での高品質な顔交換結果をわずか0.6秒で達成し、複雑な照明、大きな角度、オクルージョンといった困難なシナリオでも優れた性能を発揮します。

English

In this paper, we introduce DreamID, a diffusion-based face swapping model that achieves high levels of ID similarity, attribute preservation, image fidelity, and fast inference speed. Unlike the typical face swapping training process, which often relies on implicit supervision and struggles to achieve satisfactory results. DreamID establishes explicit supervision for face swapping by constructing Triplet ID Group data, significantly enhancing identity similarity and attribute preservation. The iterative nature of diffusion models poses challenges for utilizing efficient image-space loss functions, as performing time-consuming multi-step sampling to obtain the generated image during training is impractical. To address this issue, we leverage the accelerated diffusion model SD Turbo, reducing the inference steps to a single iteration, enabling efficient pixel-level end-to-end training with explicit Triplet ID Group supervision. Additionally, we propose an improved diffusion-based model architecture comprising SwapNet, FaceNet, and ID Adapter. This robust architecture fully unlocks the power of the Triplet ID Group explicit supervision. Finally, to further extend our method, we explicitly modify the Triplet ID Group data during training to fine-tune and preserve specific attributes, such as glasses and face shape. Extensive experiments demonstrate that DreamID outperforms state-of-the-art methods in terms of identity similarity, pose and expression preservation, and image fidelity. Overall, DreamID achieves high-quality face swapping results at 512*512 resolution in just 0.6 seconds and performs exceptionally well in challenging scenarios such as complex lighting, large angles, and occlusions.

DreamID：三重項IDグループ学習による高忠実度かつ高速な拡散モデルベースの顔交換

DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

要旨

Support