InternSVG:迈向基于多模态大语言模型的统一SVG任务处理
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
October 13, 2025
作者: Haomin Wang, Jinhui Yin, Qi Wei, Wenguang Zeng, Lixin Gu, Shenglong Ye, Zhangwei Gao, Yaohui Wang, Yanting Zhang, Yuanqi Li, Yanwen Guo, Wenhai Wang, Kai Chen, Yu Qiao, Hongjie Zhang
cs.AI
摘要
通用SVG建模因数据集碎片化、方法跨任务迁移性有限以及处理结构复杂性的难度而持续面临挑战。为此,我们利用多模态大语言模型(MLLMs)强大的迁移与泛化能力,实现了对SVG理解、编辑与生成的统一建模。我们推出了InternSVG系列,一个集数据、基准与模型于一体的套件。其核心是SAgoge,这是迄今为止规模最大、涵盖最广的SVG任务多模态数据集,囊括静态图形与动态动画。它覆盖了图标、长序列插图、科学图表及动态动画,支持不同难度级别的任务,并提供了比以往数据集更深层次的属性结构。基于这一资源,我们引入了SArena,一个配套的基准测试,其任务定义全面,评估标准统一,与SAgoge所覆盖的领域及难度谱系相匹配。在此基础上,我们提出了InternSVG,一个统一的MLLM,专为SVG理解、编辑与生成设计,具备SVG专用特殊标记、基于子词的嵌入初始化,以及从短静态SVG逐步过渡到长序列插图和复杂动画的两阶段训练策略。这一统一框架促进了正向迁移,提升了整体性能。在SArena及先前基准上的实验证实,InternSVG取得了显著进步,并持续超越领先的开源与专有模型。
English
General SVG modeling remains challenging due to fragmented datasets, limited
transferability of methods across tasks, and the difficulty of handling
structural complexity. In response, we leverage the strong transfer and
generalization capabilities of multimodal large language models (MLLMs) to
achieve unified modeling for SVG understanding, editing, and generation. We
present the InternSVG family, an integrated data-benchmark-model suite. At its
core is SAgoge, the largest and most comprehensive multimodal dataset for SVG
tasks, encompassing both static graphics and dynamic animations. It covers
icons, long-sequence illustrations, scientific diagrams, and dynamic
animations, supporting tasks of varied difficulty levels and providing deeper
hierarchies with richer attributes compared to previous datasets. Based on this
resource, we introduce SArena, a companion benchmark with comprehensive task
definitions and standardized evaluation that aligns with the domains and
difficulty spectrum covered by SAgoge. Building on these foundations, we
propose InternSVG, a unified MLLM for SVG understanding, editing, and
generation with SVG-specific special tokens, subword-based embedding
initialization, and a two-stage training strategy that progresses from short
static SVGs to long-sequence illustrations and complex animations. This unified
formulation induces positive transfer and improves overall performance.
Experiments on SArena and prior benchmark confirm that InternSVG achieves
substantial gains and consistently outperforms leading open and proprietary
counterparts.