階層的SVGトークン化：スケーラブルベクターグラフィックスモデリングのためのコンパクトな視覚的プログラムの学習

要旨

近年の大規模言語モデルは、SVG生成を微分可能レンダリング最適化から自己回帰的プログラム合成へと移行させてきた。しかし、既存のアプローチは自然言語処理から継承した汎用的なバイトレベルのトークン化に依然依存しており、ベクターグラフィックスの幾何学的構造を十分に反映できていない。数値座標は個別の記号に断片化され、空間的関係性が破壊され、深刻なトークン冗長性が生じるため、座標の幻覚現象や非効率な長系列生成を引き起こしやすい。これらの課題に対処するため、我々は自己回帰的ベクターグラフィックス生成に特化した階層的SVGトークン化フレームワークであるHiVGを提案する。HiVGは生のSVG文字列を構造化された原子トークンに分解し、さらに実行可能なコマンドとパラメータのグループを幾何学的に制約されたセグメントトークンに圧縮することで、構文の有効性を保ちつつ系列効率を大幅に改善する。空間的ミスマッチをさらに緩和するため、数値的順序信号と意味的プリオーを新規トークン埋め込みに注入する階層的平均・ノイズ（HMN）初期化戦略を導入する。プログラム複雑度を段階的に増加させるカリキュラム学習パラダイムと組み合わせることで、HiVGは実行可能なSVGプログラムのより安定的な学習を実現する。テキストからSVGへの変換および画像からSVGへの変換タスクにおける広範な実験により、従来のトークン化方式と比較して、生成の忠実性、空間的一貫性、系列効率が向上することを実証した。コードはhttps://github.com/ximinng/HiVGで公開している。

English

Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured atomic tokens and further compresses executable command--parameter groups into geometry-constrained segment tokens, substantially improving sequence efficiency while preserving syntactic validity. To further mitigate spatial mismatch, we introduce a Hierarchical Mean--Noise (HMN) initialization strategy that injects numerical ordering signals and semantic priors into new token embeddings. Combined with a curriculum training paradigm that progressively increases program complexity, HiVG enables more stable learning of executable SVG programs. Extensive experiments on both text-to-SVG and image-to-SVG tasks demonstrate improved generation fidelity, spatial consistency, and sequence efficiency compared with conventional tokenization schemes. Our code is publicly available at https://github.com/ximinng/HiVG

階層的SVGトークン化：スケーラブルベクターグラフィックスモデリングのためのコンパクトな視覚的プログラムの学習

Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling

要旨

Support