ChatPaper.aiChatPaper

层次化SVG标记学习:面向可缩放矢量图形建模的紧凑视觉程序生成

Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling

April 10, 2026
作者: Ximing Xing, Ziteng Xue, Zhenxi Li, Weicong Liang, Linqing Wang, Zhantao Yang, Tiankai Hang, Zijin Yin, Qinglin Lu, Chunyu Wang, Qian Yu
cs.AI

摘要

近期大型语言模型已将SVG生成方式从可微分渲染优化转向自回归程序合成。然而现有方法仍沿用自然语言处理中的通用字节级标记化方案,难以有效捕捉矢量图形的几何结构。数值坐标被分割成离散符号,破坏了空间关系并引入严重标记冗余,常导致坐标幻觉和低效长序列生成。针对这些挑战,我们提出HiVG——一个为自回归矢量图形生成设计的层次化SVG标记框架。HiVG将原始SVG字符串解构为结构化原子标记,并将可执行命令-参数组进一步压缩为几何约束的片段标记,在保持语法有效性的同时显著提升序列效率。为缓解空间失配问题,我们引入层次化均值-噪声初始化策略,向新标记嵌入注入数值排序信号和语义先验。结合逐步提升程序复杂度的课程训练范式,HiVG能更稳定地学习可执行SVG程序。在文本到SVG和图像到SVG任务上的大量实验表明,相较于传统标记方案,本方法在生成保真度、空间一致性和序列效率方面均有提升。代码已开源於https://github.com/ximinng/HiVG。
English
Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured atomic tokens and further compresses executable command--parameter groups into geometry-constrained segment tokens, substantially improving sequence efficiency while preserving syntactic validity. To further mitigate spatial mismatch, we introduce a Hierarchical Mean--Noise (HMN) initialization strategy that injects numerical ordering signals and semantic priors into new token embeddings. Combined with a curriculum training paradigm that progressively increases program complexity, HiVG enables more stable learning of executable SVG programs. Extensive experiments on both text-to-SVG and image-to-SVG tasks demonstrate improved generation fidelity, spatial consistency, and sequence efficiency compared with conventional tokenization schemes. Our code is publicly available at https://github.com/ximinng/HiVG
PDF21April 16, 2026