VectorGym：SVG代碼生成、繪圖與編輯的多任務基準平台

摘要

我们推出VectorGym——一个涵盖文本/草图生成、复杂编辑与视觉理解的可缩放矢量图形（SVG）综合基准测试套件。该套件针对当前缺乏符合专业设计流程且具有挑战性的现实基准的问题，提供了四个配备专家人工标注的任务：创新的草图转SVG任务（VG-Sketch）、采用高阶图元实现复杂多步编辑的新型SVG编辑数据集（VG-Edit）、文本转SVG生成（VG-Text）以及SVG描述生成（VG-Cap）。与依赖合成编辑的现有基准不同，VectorGym提供的黄金标准人工标注要求语义理解与设计意图的深度把握。我们还提出一种多任务强化学习方法，通过基于渲染的奖励机制联合优化所有四项任务。该方法基于课程学习的GRPO框架训练Qwen3-VL 8B模型，在开源模型中实现顶尖性能，超越包括Qwen3-VL 235B在内的更大规模模型，并与GPT-4o表现相当。同时我们引入了基于视觉语言模型的SVG生成评估指标（VLM-as-a-Judge），其有效性已通过人工相关性研究验证。对前沿视觉语言模型的评估揭示了显著性能差距，使VectorGym成为推进视觉代码生成的严谨框架。本套件已公开于huggingface.co/datasets/ServiceNow/VectorGym。

English

We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four tasks with expert human-authored annotations: the novel Sketch2SVG task (VG-Sketch); a new SVG editing dataset (VG-Edit) featuring complex, multi-step edits with higher-order primitives; Text2SVG generation (VG-Text); and SVG captioning (VG-Cap). Unlike prior benchmarks that rely on synthetic edits, VectorGym provides gold-standard human annotations that require semantic understanding and design intent. We also propose a multi-task reinforcement learning approach that jointly optimizes across all four tasks using rendering-based rewards. Our method, built on GRPO with curriculum learning, trains a Qwen3-VL 8B model that achieves state-of-the-art performance among open-source models, surpassing much larger models including Qwen3-VL 235B and matching GPT-4o. We also introduce a VLM-as-a-Judge metric for SVG generation, validated through human correlation studies. Our evaluation of frontier VLMs reveals significant performance gaps, positioning VectorGym as a rigorous framework for advancing visual code generation. VectorGym is publicly available on huggingface.co/datasets/ServiceNow/VectorGym.

VectorGym：SVG代碼生成、繪圖與編輯的多任務基準平台

VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing

摘要

Support