계층적 SVG 토큰화: 확장 가능한 벡터 그래픽 모델링을 위한 간결한 시각적 프로그램 학습

초록

최근 대규모 언어 모델의 발전으로 SVG 생성이 미분 가능 렌더링 최적화에서 자기회귀적 프로그램 합성으로 전환되었습니다. 그러나 기존 접근법들은 여전히 자연어 처리에서 계승된 일반적인 바이트 수준 토큰화에 의존하고 있어 벡터 그래픽의 기하학적 구조를 제대로 반영하지 못합니다. 수치 좌표값이 개별 심볼로 분할되며 공간적 관계가 파괴되고 심각한 토큰 중복이 발생하여, 좌표 환각 현상과 비효율적인 장문열 생성을 초래하는 경우가 빈번합니다. 이러한 문제점을 해결하기 위해 우리는 자기회귀적 벡터 그래픽 생성에 특화된 계층적 SVG 토큰화 프레임워크인 HiVG를 제안합니다. HiVG는 원시 SVG 문자열을 구조화된 원자 토큰으로 분해하고, 실행 가능한 명령어-매개변수 그룹을 기하학적 제약을 가진 세그먼트 토큰으로 추가 압축하여 구문 유효성을 보존하면서 시퀀스 효율을 크게 향상시킵니다. 공간 불일치 문제를 더욱 완화하기 위해, 수치적 순서 신호와 의미론적 사전 지식을 새로운 토큰 임베딩에 주입하는 계층적 평균-노이즈 초기화 전략을 도입했습니다. 프로그램 복잡도를 점진적으로 증가시키는 커리큘럼 학습 패러다임과 결합된 HiVG는 실행 가능한 SVG 프로그램의 더 안정적인 학습을 가능하게 합니다. 텍스트-to-SVG 및 이미지-to-SVG 과제에 대한 폭넓은 실험을 통해 기존 토큰화 방식 대비 향상된 생성 정확도, 공간 일관성 및 시퀀스 효율성을 입증하였습니다. 우리의 코드는 https://github.com/ximinng/HiVG 에서 공개되어 있습니다.

English

Recent large language models have shifted SVG generation from differentiable rendering optimization to autoregressive program synthesis. However, existing approaches still rely on generic byte-level tokenization inherited from natural language processing, which poorly reflects the geometric structure of vector graphics. Numerical coordinates are fragmented into discrete symbols, destroying spatial relationships and introducing severe token redundancy, often leading to coordinate hallucination and inefficient long-sequence generation. To address these challenges, we propose HiVG, a hierarchical SVG tokenization framework tailored for autoregressive vector graphics generation. HiVG decomposes raw SVG strings into structured atomic tokens and further compresses executable command--parameter groups into geometry-constrained segment tokens, substantially improving sequence efficiency while preserving syntactic validity. To further mitigate spatial mismatch, we introduce a Hierarchical Mean--Noise (HMN) initialization strategy that injects numerical ordering signals and semantic priors into new token embeddings. Combined with a curriculum training paradigm that progressively increases program complexity, HiVG enables more stable learning of executable SVG programs. Extensive experiments on both text-to-SVG and image-to-SVG tasks demonstrate improved generation fidelity, spatial consistency, and sequence efficiency compared with conventional tokenization schemes. Our code is publicly available at https://github.com/ximinng/HiVG

계층적 SVG 토큰화: 확장 가능한 벡터 그래픽 모델링을 위한 간결한 시각적 프로그램 학습

Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling

초록

Support