面向0:即時在文本到圖像模型上對臉部進行條件設置
Face0: Instantaneously Conditioning a Text-to-Image Model on a Face
June 11, 2023
作者: Dani Valevski, Danny Wasserman, Yossi Matias, Yaniv Leviathan
cs.AI
摘要
我們提出了Face0,一種新穎的方法,可以在樣本時間內立即將文本到圖像生成模型條件化為一張臉,而無需進行任何優化程序,如微調或反演。我們通過將包含的臉部嵌入與帶有標註的圖像數據集進行擴充,並在擴充的數據集上訓練圖像生成模型。一旦訓練完成,我們的系統在推論時幾乎與基礎基本模型相同,因此能夠在幾秒內生成圖像,只需提供用戶提供的臉部圖像和提示。我們的方法取得了令人滿意的結果,非常簡單,極快速,並為基礎模型提供了新的功能,如通過文本或直接操作輸入臉部嵌入來控制生成的圖像。此外,當使用固定的隨機向量而不是來自用戶提供圖像的臉部嵌入時,我們的方法基本上解決了跨圖像的一致性角色生成問題。最後,雖然需要進一步研究,但我們希望我們的方法,將模型的文本偏見與其對臉部的偏見分離開來,可能是未來文本到圖像模型中偏見的一步緩解。
English
We present Face0, a novel way to instantaneously condition a text-to-image
generation model on a face, in sample time, without any optimization procedures
such as fine-tuning or inversions. We augment a dataset of annotated images
with embeddings of the included faces and train an image generation model, on
the augmented dataset. Once trained, our system is practically identical at
inference time to the underlying base model, and is therefore able to generate
images, given a user-supplied face image and a prompt, in just a couple of
seconds. Our method achieves pleasing results, is remarkably simple, extremely
fast, and equips the underlying model with new capabilities, like controlling
the generated images both via text or via direct manipulation of the input face
embeddings. In addition, when using a fixed random vector instead of a face
embedding from a user supplied image, our method essentially solves the problem
of consistent character generation across images. Finally, while requiring
further research, we hope that our method, which decouples the model's textual
biases from its biases on faces, might be a step towards some mitigation of
biases in future text-to-image models.