Face0: 얼굴에 대한 텍스트-이미지 모델의 즉각적인 조건 설정

초록

Face0를 소개합니다. 이는 미세 조정(fine-tuning)이나 역전(inversion)과 같은 최적화 절차 없이도 샘플 시간 내에 텍스트-이미지 생성 모델을 얼굴에 즉시 조건화할 수 있는 새로운 방법입니다. 우리는 주석이 달린 이미지 데이터셋에 포함된 얼굴의 임베딩을 추가하고, 이 확장된 데이터셋으로 이미지 생성 모델을 학습시켰습니다. 한번 학습이 완료되면, 우리 시스템은 추론 시간에 기반 모델과 실질적으로 동일하며, 따라서 사용자가 제공한 얼굴 이미지와 프롬프트가 주어지면 단 몇 초 만에 이미지를 생성할 수 있습니다. 우리의 방법은 만족스러운 결과를 달성하며, 매우 간단하고 극도로 빠르며, 기반 모델에 새로운 기능을 부여합니다. 예를 들어, 텍스트를 통해 또는 입력 얼굴 임베딩을 직접 조작하여 생성된 이미지를 제어할 수 있습니다. 또한, 사용자가 제공한 이미지의 얼굴 임베딩 대신 고정된 랜덤 벡터를 사용할 때, 우리의 방법은 여러 이미지에 걸쳐 일관된 캐릭터 생성 문제를 근본적으로 해결합니다. 마지막으로, 더 많은 연구가 필요하지만, 우리의 방법은 모델의 텍스트 편향과 얼굴 편향을 분리함으로써, 향후 텍스트-이미지 모델의 편향을 완화하는 데 한 걸음이 될 수 있기를 바랍니다.

English

We present Face0, a novel way to instantaneously condition a text-to-image generation model on a face, in sample time, without any optimization procedures such as fine-tuning or inversions. We augment a dataset of annotated images with embeddings of the included faces and train an image generation model, on the augmented dataset. Once trained, our system is practically identical at inference time to the underlying base model, and is therefore able to generate images, given a user-supplied face image and a prompt, in just a couple of seconds. Our method achieves pleasing results, is remarkably simple, extremely fast, and equips the underlying model with new capabilities, like controlling the generated images both via text or via direct manipulation of the input face embeddings. In addition, when using a fixed random vector instead of a face embedding from a user supplied image, our method essentially solves the problem of consistent character generation across images. Finally, while requiring further research, we hope that our method, which decouples the model's textual biases from its biases on faces, might be a step towards some mitigation of biases in future text-to-image models.

Face0: 얼굴에 대한 텍스트-이미지 모델의 즉각적인 조건 설정

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face

초록

Support