DreamID: 트리플렛 ID 그룹 학습을 통한 고해상도 및 고속 확산 기반 얼굴 교체

초록

본 논문에서는 높은 수준의 ID 유사성, 속성 보존, 이미지 충실도 및 빠른 추론 속도를 달성하는 확산 기반 얼굴 교체 모델인 DreamID를 소개한다. 일반적인 얼굴 교체 학습 과정이 암묵적 감독에 의존하며 만족스러운 결과를 얻기 어려운 반면, DreamID는 Triplet ID Group 데이터를 구성하여 얼굴 교체에 대한 명시적 감독을 확립함으로써 ID 유사성과 속성 보존을 크게 향상시킨다. 확산 모델의 반복적 특성은 효율적인 이미지 공간 손실 함수 활용에 어려움을 초래하는데, 이는 학습 중 생성된 이미지를 얻기 위해 시간이 많이 소요되는 다단계 샘플링을 수행하는 것이 비현실적이기 때문이다. 이 문제를 해결하기 위해 우리는 가속화된 확산 모델인 SD Turbo를 활용하여 추론 단계를 단일 반복으로 줄이고, 명시적 Triplet ID Group 감독을 통한 효율적인 픽셀 수준의 종단간 학습을 가능하게 한다. 또한, SwapNet, FaceNet 및 ID Adapter로 구성된 개선된 확산 기반 모델 아키텍처를 제안한다. 이 강력한 아키텍처는 Triplet ID Group 명시적 감독의 잠재력을 완전히 발휘한다. 마지막으로, 우리의 방법을 더욱 확장하기 위해 학습 중 Triplet ID Group 데이터를 명시적으로 수정하여 안경 및 얼굴 형태와 같은 특정 속성을 미세 조정하고 보존한다. 광범위한 실험을 통해 DreamID가 ID 유사성, 포즈 및 표정 보존, 이미지 충실도 측면에서 최신 방법들을 능가함을 입증한다. 전반적으로, DreamID는 512*512 해상도에서 단 0.6초만에 고품질의 얼굴 교체 결과를 달성하며, 복잡한 조명, 큰 각도 및 가림과 같은 어려운 시나리오에서도 탁월한 성능을 보인다.

English

In this paper, we introduce DreamID, a diffusion-based face swapping model that achieves high levels of ID similarity, attribute preservation, image fidelity, and fast inference speed. Unlike the typical face swapping training process, which often relies on implicit supervision and struggles to achieve satisfactory results. DreamID establishes explicit supervision for face swapping by constructing Triplet ID Group data, significantly enhancing identity similarity and attribute preservation. The iterative nature of diffusion models poses challenges for utilizing efficient image-space loss functions, as performing time-consuming multi-step sampling to obtain the generated image during training is impractical. To address this issue, we leverage the accelerated diffusion model SD Turbo, reducing the inference steps to a single iteration, enabling efficient pixel-level end-to-end training with explicit Triplet ID Group supervision. Additionally, we propose an improved diffusion-based model architecture comprising SwapNet, FaceNet, and ID Adapter. This robust architecture fully unlocks the power of the Triplet ID Group explicit supervision. Finally, to further extend our method, we explicitly modify the Triplet ID Group data during training to fine-tune and preserve specific attributes, such as glasses and face shape. Extensive experiments demonstrate that DreamID outperforms state-of-the-art methods in terms of identity similarity, pose and expression preservation, and image fidelity. Overall, DreamID achieves high-quality face swapping results at 512*512 resolution in just 0.6 seconds and performs exceptionally well in challenging scenarios such as complex lighting, large angles, and occlusions.

DreamID: 트리플렛 ID 그룹 학습을 통한 고해상도 및 고속 확산 기반 얼굴 교체

DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

초록

Support