VFIG：基于视觉语言模型的复杂SVG图形矢量化方法

摘要

可缩放矢量图形（SVG）作为技术插图和数字设计的核心格式，具有精确的分辨率独立性与灵活的语义可编辑性。然而实践中，原始矢量源文件常因遗失或无法访问，仅存难以修改或缩放的"扁平化"栅格化版本（如PNG或JPEG）。手动重建这些图形需耗费大量人力且依赖专业经验才能还原原始几何意图。为弥补这一鸿沟，我们提出VFIG系列视觉语言模型，专门针对复杂高保真图形至SVG的转换任务进行训练。尽管该任务本质依赖数据驱动，现有数据集通常规模有限且缺乏专业图示的复杂性。为此我们推出VFIG-DATA大规模数据集，通过整合真实论文图示与程序化生成图表，构建了包含6.6万组高质量图形-SVG配对的数据资源。基于SVG由可复用图元与层次化局部结构构成的特点，我们设计了由粗到精的训练方案：首先通过监督微调（SFT）学习原子级图元，继而采用强化学习（RL）优化策略提升整体图示保真度、布局一致性及拓扑边缘案例处理能力。最后我们建立VFIG-BENCH评估体系，引入专用于衡量复杂图形结构完整性的新型指标。实验表明，VFIG在开源模型中达到最先进性能，与GPT-5.2表现相当，在VFIG-BENCH上获得0.829的VLM-Judge评分。

English

Scalable Vector Graphics (SVG) are an essential format for technical illustration and digital design, offering precise resolution independence and flexible semantic editability. In practice, however, original vector source files are frequently lost or inaccessible, leaving only "flat" rasterized versions (e.g., PNG or JPEG) that are difficult to modify or scale. Manually reconstructing these figures is a prohibitively labor-intensive process, requiring specialized expertise to recover the original geometric intent. To bridge this gap, we propose VFIG, a family of Vision-Language Models trained for complex and high-fidelity figure-to-SVG conversion. While this task is inherently data-driven, existing datasets are typically small-scale and lack the complexity of professional diagrams. We address this by introducing VFIG-DATA, a large-scale dataset of 66K high-quality figure-SVG pairs, curated from a diverse mix of real-world paper figures and procedurally generated diagrams. Recognizing that SVGs are composed of recurring primitives and hierarchical local structures, we introduce a coarse-to-fine training curriculum that begins with supervised fine-tuning (SFT) to learn atomic primitives and transitions to reinforcement learning (RL) refinement to optimize global diagram fidelity, layout consistency, and topological edge cases. Finally, we introduce VFIG-BENCH, a comprehensive evaluation suite with novel metrics designed to measure the structural integrity of complex figures. VFIG achieves state-of-the-art performance among open-source models and performs on par with GPT-5.2, achieving a VLM-Judge score of 0.829 on VFIG-BENCH.