InstantID: 초 단위로 정체성 보존 생성

초록

개인화된 이미지 합성 분야에서는 Textual Inversion, DreamBooth, LoRA와 같은 방법들이 상당한 진전을 이루어 왔습니다. 그러나 이러한 방법들은 높은 저장 공간 요구량, 긴 미세 조정 과정, 그리고 다수의 참조 이미지 필요성으로 인해 실제 적용에 어려움을 겪고 있습니다. 반면, 기존의 ID 임베딩 기반 방법들은 단일 순방향 추론만으로도 가능하지만, 수많은 모델 파라미터에 걸친 광범위한 미세 조정이 필요하거나, 커뮤니티 사전 학습 모델과의 호환성이 부족하거나, 높은 얼굴 정확도를 유지하지 못하는 등의 문제에 직면해 있습니다. 이러한 한계를 극복하기 위해, 우리는 강력한 확산 모델 기반 솔루션인 InstantID를 소개합니다. 우리의 플러그 앤 플레이 모듈은 단일 얼굴 이미지만을 사용하여 다양한 스타일의 이미지 개인화를 능숙하게 처리하며, 동시에 높은 정확도를 보장합니다. 이를 위해, 우리는 강력한 의미론적 조건과 약한 공간적 조건을 부과하여 얼굴 이미지와 랜드마크 이미지를 텍스트 프롬프트와 통합하여 이미지 생성을 유도하는 새로운 IdentityNet을 설계했습니다. InstantID는 탁월한 성능과 효율성을 보여주며, 신원 보존이 중요한 실제 응용 분야에서 매우 유용함을 입증했습니다. 또한, 우리의 작업은 SD1.5 및 SDXL과 같은 인기 있는 사전 학습된 텍스트-이미지 확산 모델과 원활하게 통합되어 적응형 플러그인으로서의 역할을 합니다. 우리의 코드와 사전 학습된 체크포인트는 https://github.com/InstantID/InstantID에서 이용 가능할 것입니다.

English

There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity. To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation. InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount. Moreover, our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin. Our codes and pre-trained checkpoints will be available at https://github.com/InstantID/InstantID.

InstantID: 초 단위로 정체성 보존 생성

InstantID: Zero-shot Identity-Preserving Generation in Seconds

초록

Support