BRAT: アーキテクチャに依存しないテキスト反転のためのボーナス直交トークン

要旨

テキストインバージョンは、拡散モデルをパーソナライズし、新しい主題やスタイルをモデルに教えるための一般的な手法として残っている。我々は、UNetの代替手段を用いたテキストインバージョンが十分に検討されていないことに注目し、ビジョントランスフォーマーを用いたテキストインバージョンの実験を行った。また、UNetとその独特なレイヤーを明示的に使用せずにテキストインバージョンを最適化する戦略を模索し、ボーナストークンを追加して直交性を強化した。その結果、ボーナストークンの使用はソース画像への忠実度を向上させ、ビジョントランスフォーマーの使用はプロンプトへの忠実度を向上させることがわかった。コードはhttps://github.com/jamesBaker361/tex_inv_plusで公開されている。

English

Textual Inversion remains a popular method for personalizing diffusion models, in order to teach models new subjects and styles. We note that textual inversion has been underexplored using alternatives to the UNet, and experiment with textual inversion with a vision transformer. We also seek to optimize textual inversion using a strategy that does not require explicit use of the UNet and its idiosyncratic layers, so we add bonus tokens and enforce orthogonality. We find the use of the bonus token improves adherence to the source images and the use of the vision transformer improves adherence to the prompt. Code is available at https://github.com/jamesBaker361/tex_inv_plus.

BRAT: アーキテクチャに依存しないテキスト反転のためのボーナス直交トークン

BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion

要旨

Support