ChatPaper.aiChatPaper

Glyph-ByT5:一種針對精確視覺文本呈現的自定義文本編碼器

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

March 14, 2024
作者: Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan
cs.AI

摘要

視覺文本呈現對當代文本到圖像生成模型構成了一個基本挑戰,其核心問題在於文本編碼器的不足。為了實現準確的文本呈現,我們確定了文本編碼器的兩個關鍵要求:字符感知和與字形的對齊。我們的解決方案涉及打造一系列定制的文本編碼器,Glyph-ByT5,通過微調具有字符感知能力的ByT5編碼器,使用精心策劃的成對字形-文本數據集。我們提出了一種有效的方法,將Glyph-ByT5與SDXL集成在一起,從而創建了用於設計圖像生成的Glyph-SDXL模型。這顯著提高了文本呈現的準確性,將其從不到20%提升至幾乎90%在我們的設計圖像基準上。值得注意的是,Glyph-SDXL現在具有對文本段落呈現的能力,實現了對數十到數百個字符的高拼寫準確性,並具有自動多行佈局。最後,通過對Glyph-SDXL進行微調,使用一小組高質量、照片逼真的圖像,展示了在開放域真實圖像中場景文本呈現能力的顯著改善。這些引人注目的結果旨在鼓勵進一步探索,設計用於各種具有挑戰性任務的定制文本編碼器。
English
Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies. To achieve accurate text rendering, we identify two crucial requirements for text encoders: character awareness and alignment with glyphs. Our solution involves crafting a series of customized text encoder, Glyph-ByT5, by fine-tuning the character-aware ByT5 encoder using a meticulously curated paired glyph-text dataset. We present an effective method for integrating Glyph-ByT5 with SDXL, resulting in the creation of the Glyph-SDXL model for design image generation. This significantly enhances text rendering accuracy, improving it from less than 20% to nearly 90% on our design image benchmark. Noteworthy is Glyph-SDXL's newfound ability for text paragraph rendering, achieving high spelling accuracy for tens to hundreds of characters with automated multi-line layouts. Finally, through fine-tuning Glyph-SDXL with a small set of high-quality, photorealistic images featuring visual text, we showcase a substantial improvement in scene text rendering capabilities in open-domain real images. These compelling outcomes aim to encourage further exploration in designing customized text encoders for diverse and challenging tasks.

Summary

AI-Generated Summary

PDF181December 15, 2024