分层式SVG标记化：学习紧凑的可视化程序实现可缩放矢量图形建模

摘要

近期，大型语言模型已将SVG生成方式从可微分渲染优化转向自回归程序合成。然而，现有方法仍沿用自然语言处理中的通用字节级标记化方案，难以体现矢量图形的几何结构特征。数值坐标被分割为离散符号，破坏了空间关系并引入严重标记冗余，常导致坐标幻觉和低效的长序列生成。为解决这些问题，我们提出HiVG——一个专为自回归矢量图形生成设计的层次化SVG标记化框架。HiVG将原始SVG字符串解构为结构化原子标记，并将可执行的命令-参数组进一步压缩为几何约束的片段标记，在保持语法有效性的同时显著提升序列效率。为缓解空间失配问题，我们引入层次化均值-噪声初始化策略，向新标记嵌入中注入数值排序信号和语义先验。结合逐步增加程序复杂度的课程训练范式，HiVG能更稳定地学习可执行SVG程序。在文本到SVG和图像到SVG任务上的大量实验表明，相比传统标记化方案，该框架在生成保真度、空间一致性和序列效率方面均有提升。代码已开源：https://github.com/ximinng/HiVG

English

Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured atomic tokens and further compresses executable command--parameter groups into geometry-constrained segment tokens, substantially improving sequence efficiency while preserving syntactic validity. To further mitigate spatial mismatch, we introduce a Hierarchical Mean--Noise (HMN) initialization strategy that injects numerical ordering signals and semantic priors into new token embeddings. Combined with a curriculum training paradigm that progressively increases program complexity, HiVG enables more stable learning of executable SVG programs. Extensive experiments on both text-to-SVG and image-to-SVG tasks demonstrate improved generation fidelity, spatial consistency, and sequence efficiency compared with conventional tokenization schemes. Our code is publicly available at https://github.com/ximinng/HiVG