面部0:在文本到图像模型上即时对面部进行条件设置
Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
June 11, 2023
作者: Dani Valevski, Danny Wasserman, Yossi Matias, Yaniv Leviathan
cs.AI
摘要
我们提出了Face0,这是一种新颖的方法,可以在样本时间内立即将文本到图像生成模型与面部进行条件设置,而无需任何优化过程,如微调或反演。我们通过将包含的面部嵌入与带注释图像的数据集相结合,并在增强后的数据集上训练图像生成模型来实现。一旦训练完成,我们的系统在推断时与基础基础模型实际上是相同的,因此能够在几秒钟内生成图像,只需提供用户提供的面部图像和提示。我们的方法取得了令人满意的结果,非常简单,极快,并为基础模型提供了新的功能,如通过文本或直接操作输入面部嵌入来控制生成的图像。此外,当使用固定的随机向量而不是来自用户提供图像的面部嵌入时,我们的方法基本上解决了跨图像一致性字符生成的问题。最后,虽然需要进一步研究,但我们希望我们的方法,将模型的文本偏见与其对面部的偏见分离开来,可能是未来文本到图像模型中减轻偏见的一步。
English
We present Face0, a novel way to instantaneously condition a text-to-image
generation model on a face, in sample time, without any optimization procedures
such as fine-tuning or inversions. We augment a dataset of annotated images
with embeddings of the included faces and train an image generation model, on
the augmented dataset. Once trained, our system is practically identical at
inference time to the underlying base model, and is therefore able to generate
images, given a user-supplied face image and a prompt, in just a couple of
seconds. Our method achieves pleasing results, is remarkably simple, extremely
fast, and equips the underlying model with new capabilities, like controlling
the generated images both via text or via direct manipulation of the input face
embeddings. In addition, when using a fixed random vector instead of a face
embedding from a user supplied image, our method essentially solves the problem
of consistent character generation across images. Finally, while requiring
further research, we hope that our method, which decouples the model's textual
biases from its biases on faces, might be a step towards some mitigation of
biases in future text-to-image models.