VecGlypher：基於語言模型的統一向量字形生成系統

摘要

向量字形是數位字型設計的原子單位，但多數基於學習的流程仍依賴精心策劃的範例字表與點陣至向量的後處理，這限制了可訪問性與可編輯性。我們推出 VecGlypher——一個能直接根據文字描述或圖像範例生成高保真向量字形的多模態語言模型。給定樣式提示、可選的參考字形圖像及目標字符，VecGlypher 能以自回歸方式輸出 SVG 路徑標記，繞過點陣中介層，單次生成可編輯且封閉的輪廓。實現此技術的關鍵在於具備字型意識的數據與訓練方案：(i) 首先在 39,000 套含噪聲的 Envato 字型上進行大規模續寫訓練，以掌握 SVG 語法與長序列幾何結構；(ii) 隨後對 2,500 套專家標註的 Google Fonts 進行後訓練，透過描述性標籤與範例對齊語言、圖像與幾何關係。預處理流程包含座標系歸一化、路徑規範化、字族去重及座標量化，以確保長序列解碼穩定性。在跨字族樣本外評估中，VecGlypher 在純文字生成任務上顯著超越通用大型語言模型與專業向量字型基線模型，而基於圖像參考的生成效能更達到頂尖水平，較 DeepVecFont-v2 與 DualVector 有顯著提升。消融實驗表明模型規模與兩階段訓練方案至關重要，且絕對座標序列化能產生最佳幾何結果。VecGlypher 透過讓使用者以文字或範例進行設計，降低了字型創作門檻，並為未來多模態設計工具提供了可擴展的基礎架構。

English

Vector glyphs are the atomic units of digital typography, yet most learning-based pipelines still depend on carefully curated exemplar sheets and raster-to-vector postprocessing, which limits accessibility and editability. We introduce VecGlypher, a single multimodal language model that generates high-fidelity vector glyphs directly from text descriptions or image exemplars. Given a style prompt, optional reference glyph images, and a target character, VecGlypher autoregressively emits SVG path tokens, avoiding raster intermediates and producing editable, watertight outlines in one pass. A typography-aware data and training recipe makes this possible: (i) a large-scale continuation stage on 39K noisy Envato fonts to master SVG syntax and long-horizon geometry, followed by (ii) post-training on 2.5K expert-annotated Google Fonts with descriptive tags and exemplars to align language and imagery with geometry; preprocessing normalizes coordinate frames, canonicalizes paths, de-duplicates families, and quantizes coordinates for stable long-sequence decoding. On cross-family OOD evaluation, VecGlypher substantially outperforms both general-purpose LLMs and specialized vector-font baselines for text-only generation, while image-referenced generation reaches a state-of-the-art performance, with marked gains over DeepVecFont-v2 and DualVector. Ablations show that model scale and the two-stage recipe are critical and that absolute-coordinate serialization yields the best geometry. VecGlypher lowers the barrier to font creation by letting users design with words or exemplars, and provides a scalable foundation for future multimodal design tools.

VecGlypher：基於語言模型的統一向量字形生成系統

VecGlypher: Unified Vector Glyph Generation with Language Models

摘要

Support