BRAT:用于体系结构无关文本反转的奖励正交标记
BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion
August 8, 2024
作者: James Baker
cs.AI
摘要
文本反转仍然是个受欢迎的个性化扩散模型的方法,用于教授模型新的主题和风格。我们注意到,文本反转在使用UNet之外的替代方法方面尚未得到充分探讨,并尝试将文本反转与视觉Transformer相结合。我们还试图通过一种不需要显式使用UNet及其特殊层的策略来优化文本反转,因此我们添加了奖励标记并强制正交性。我们发现奖励标记的使用改善了对源图像的遵循,而视觉Transformer的使用改善了对提示的遵循。代码可在https://github.com/jamesBaker361/tex_inv_plus找到。
English
Textual Inversion remains a popular method for personalizing diffusion
models, in order to teach models new subjects and styles. We note that textual
inversion has been underexplored using alternatives to the UNet, and experiment
with textual inversion with a vision transformer. We also seek to optimize
textual inversion using a strategy that does not require explicit use of the
UNet and its idiosyncratic layers, so we add bonus tokens and enforce
orthogonality. We find the use of the bonus token improves adherence to the
source images and the use of the vision transformer improves adherence to the
prompt. Code is available at https://github.com/jamesBaker361/tex_inv_plus.Summary
AI-Generated Summary