VecGlypher：基于语言模型的统一矢量字形生成系统

摘要

矢量字形是数字排版的原子单元，但当前多数基于学习的流程仍依赖精心编排的范例字表与栅格到矢量的后处理，这限制了可访问性与可编辑性。我们推出VecGlypher——一个能从文本描述或图像范例直接生成高保真矢量字形的多模态语言模型。给定样式提示、可选的参考字形图像及目标字符，VecGlypher通过自回归方式输出SVG路径标记，无需栅格中间件即可一次性生成可编辑的封闭轮廓。实现这一突破的关键在于面向排版的数据与训练方案：（i）在3.9万个含噪声的Envato字体库上进行大规模预训练，以掌握SVG语法与长序列几何结构；（ii）基于2500套专家标注的Google Fonts进行后训练，通过描述性标签与范例实现语言、图像与几何的对齐。预处理阶段通过归一化坐标系、路径规范化、字体族去重和坐标量化，确保长序列解码的稳定性。在跨字体集分布外评估中，VecGlypher在纯文本生成任务上显著优于通用大语言模型与专业矢量字体基线，而基于图像参考的生成效果达到业界最优，较DeepVecFont-v2和DualVector有明显提升。消融实验表明模型规模与两阶段训练方案至关重要，绝对坐标序列化能获得最佳几何表现。VecGlypher通过支持文字或范例驱动设计降低了字体创作门槛，为未来多模态设计工具提供了可扩展的基础框架。

English

Vector glyphs are the atomic units of digital typography, yet most learning-based pipelines still depend on carefully curated exemplar sheets and raster-to-vector postprocessing, which limits accessibility and editability. We introduce VecGlypher, a single multimodal language model that generates high-fidelity vector glyphs directly from text descriptions or image exemplars. Given a style prompt, optional reference glyph images, and a target character, VecGlypher autoregressively emits SVG path tokens, avoiding raster intermediates and producing editable, watertight outlines in one pass. A typography-aware data and training recipe makes this possible: (i) a large-scale continuation stage on 39K noisy Envato fonts to master SVG syntax and long-horizon geometry, followed by (ii) post-training on 2.5K expert-annotated Google Fonts with descriptive tags and exemplars to align language and imagery with geometry; preprocessing normalizes coordinate frames, canonicalizes paths, de-duplicates families, and quantizes coordinates for stable long-sequence decoding. On cross-family OOD evaluation, VecGlypher substantially outperforms both general-purpose LLMs and specialized vector-font baselines for text-only generation, while image-referenced generation reaches a state-of-the-art performance, with marked gains over DeepVecFont-v2 and DualVector. Ablations show that model scale and the two-stage recipe are critical and that absolute-coordinate serialization yields the best geometry. VecGlypher lowers the barrier to font creation by letting users design with words or exemplars, and provides a scalable foundation for future multimodal design tools.