FontStudio:用于生成连贯一致的字体效果的形状自适应扩散模型
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
June 12, 2024
作者: Xinzhi Mu, Li Chen, Bohan Chen, Shuyang Gu, Jianmin Bao, Dong Chen, Ji Li, Yuhui Yuan
cs.AI
摘要
最近,应用现代基于扩散的文本到图像生成模型来创作艺术字体,传统上是专业设计师的领域,引起了极大关注。与大多数现有研究集中于生成艺术字体不同,我们的研究旨在解决一个新颖且更具挑战性的问题:多语言字体文本效果的生成。这一任务基本上要求在字体形状画布的限制内生成连贯一致的视觉内容,而不是传统的矩形画布。为了解决这一任务,我们引入了一种新颖的形状自适应扩散模型,能够解释给定形状并在不规则画布内策略性地规划像素分布。为了实现这一目标,我们整理了一个高质量的形状自适应图像文本数据集,并将分割掩模作为视觉条件,引导在不规则画布内进行图像生成过程。这种方法使得传统基于矩形画布的扩散模型能够根据提供的几何形状产生所需的概念。其次,为了保持多个字母之间的一致性,我们还提出了一种无需训练的形状自适应效果转移方法,用于将纹理从生成的参考字母转移到其他字母上。关键见解包括构建字体效果噪声先验并在连接的潜在空间中传播字体效果信息。通过用户偏好研究,我们验证了我们的FontStudio系统的有效性,结果显示我们的系统在审美方面的胜率高达78%,甚至与最新无与伦比的商业产品Adobe Firefly相比也更受青睐。
English
Recently, the application of modern diffusion-based text-to-image generation
models for creating artistic fonts, traditionally the domain of professional
designers, has garnered significant interest. Diverging from the majority of
existing studies that concentrate on generating artistic typography, our
research aims to tackle a novel and more demanding challenge: the generation of
text effects for multilingual fonts. This task essentially requires generating
coherent and consistent visual content within the confines of a font-shaped
canvas, as opposed to a traditional rectangular canvas. To address this task,
we introduce a novel shape-adaptive diffusion model capable of interpreting the
given shape and strategically planning pixel distributions within the irregular
canvas. To achieve this, we curate a high-quality shape-adaptive image-text
dataset and incorporate the segmentation mask as a visual condition to steer
the image generation process within the irregular-canvas. This approach enables
the traditionally rectangle canvas-based diffusion model to produce the desired
concepts in accordance with the provided geometric shapes. Second, to maintain
consistency across multiple letters, we also present a training-free,
shape-adaptive effect transfer method for transferring textures from a
generated reference letter to others. The key insights are building a font
effect noise prior and propagating the font effect information in a
concatenated latent space. The efficacy of our FontStudio system is confirmed
through user preference studies, which show a marked preference (78% win-rates
on aesthetics) for our system even when compared to the latest unrivaled
commercial product, Adobe Firefly.Summary
AI-Generated Summary