Glyph-ByT5-v2:准确多语言视觉文本渲染的强大美学基线
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
June 14, 2024
作者: Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Ji Li, Yuhui Yuan
cs.AI
摘要
最近,Glyph-ByT5 在图形设计图像中实现了高度准确的视觉文本呈现性能。然而,它仍然仅专注于英语,在视觉吸引力方面表现相对较差。在这项工作中,我们通过提出 Glyph-ByT5-v2 和 Glyph-SDXL-v2 来解决这两个基本限制,这两者不仅支持 10 种不同语言的准确视觉文本呈现,还实现了更好的美学质量。为了实现这一目标,我们做出了以下贡献:(i) 创建了一个高质量的多语言字形文本和图形设计数据集,包括超过 100 万个字形文本对和 1000 万个图形设计图像文本对,涵盖了其他九种语言,(ii) 建立了一个多语言视觉段落基准,包括 1000 个提示,每种语言有 100 个,用于评估多语言视觉拼写准确性,以及 (iii) 利用最新的步骤感知偏好学习方法来增强视觉美学质量。通过结合这些技术,我们提供了一个强大的定制多语言文本编码器 Glyph-ByT5-v2,以及一个强大的美学图形生成模型 Glyph-SDXL-v2,可以支持 10 种不同语言的准确拼写。考虑到最新的 DALL-E3 和 Ideogram 1.0 仍然在多语言视觉文本呈现任务中遇到困难,我们认为我们的工作是一项重大进步。
English
Recently, Glyph-ByT5 has achieved highly accurate visual text rendering
performance in graphic design images. However, it still focuses solely on
English and performs relatively poorly in terms of visual appeal. In this work,
we address these two fundamental limitations by presenting Glyph-ByT5-v2 and
Glyph-SDXL-v2, which not only support accurate visual text rendering for 10
different languages but also achieve much better aesthetic quality. To achieve
this, we make the following contributions: (i) creating a high-quality
multilingual glyph-text and graphic design dataset consisting of more than 1
million glyph-text pairs and 10 million graphic design image-text pairs
covering nine other languages, (ii) building a multilingual visual paragraph
benchmark consisting of 1,000 prompts, with 100 for each language, to assess
multilingual visual spelling accuracy, and (iii) leveraging the latest
step-aware preference learning approach to enhance the visual aesthetic
quality. With the combination of these techniques, we deliver a powerful
customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic
graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in
10 different languages. We perceive our work as a significant advancement,
considering that the latest DALL-E3 and Ideogram 1.0 still struggle with the
multilingual visual text rendering task.Summary
AI-Generated Summary