Face0: 顔を即座にテキストから画像生成モデルに条件付けする

要旨

Face0を紹介します。これは、ファインチューニングやインバージョンといった最適化プロセスを一切必要とせず、サンプル時間内でテキストから画像を生成するモデルを顔に即座に条件付けする新しい手法です。注釈付き画像データセットに含まれる顔の埋め込みを追加し、拡張されたデータセットで画像生成モデルをトレーニングします。一度トレーニングされると、推論時には基盤となるモデルと実質的に同一となり、ユーザーが提供した顔画像とプロンプトを与えるだけで、わずか数秒で画像を生成することができます。本手法は、満足のいく結果を達成し、非常にシンプルで極めて高速であり、基盤となるモデルに新しい機能を提供します。例えば、テキストによる制御や入力顔埋め込みの直接操作を通じて生成画像を制御することが可能です。さらに、ユーザー提供の画像から得た顔埋め込みの代わりに固定のランダムベクトルを使用する場合、本手法は画像間での一貫したキャラクター生成の問題を本質的に解決します。最後に、さらなる研究が必要ではありますが、本手法はモデルのテキスト的バイアスと顔に対するバイアスを分離するものであり、将来のテキストから画像を生成するモデルにおけるバイアスの緩和に向けた一歩となることを期待しています。

English

We present Face0, a novel way to instantaneously condition a text-to-image generation model on a face, in sample time, without any optimization procedures such as fine-tuning or inversions. We augment a dataset of annotated images with embeddings of the included faces and train an image generation model, on the augmented dataset. Once trained, our system is practically identical at inference time to the underlying base model, and is therefore able to generate images, given a user-supplied face image and a prompt, in just a couple of seconds. Our method achieves pleasing results, is remarkably simple, extremely fast, and equips the underlying model with new capabilities, like controlling the generated images both via text or via direct manipulation of the input face embeddings. In addition, when using a fixed random vector instead of a face embedding from a user supplied image, our method essentially solves the problem of consistent character generation across images. Finally, while requiring further research, we hope that our method, which decouples the model's textual biases from its biases on faces, might be a step towards some mitigation of biases in future text-to-image models.

Face0: 顔を即座にテキストから画像生成モデルに条件付けする

Face0: Instantaneously Conditioning a Text-to-Image Model on a Face

要旨

Support