VectorGym：SVG代码生成、草图绘制与编辑的多任务基准平台

摘要

我们推出VectorGym——一个涵盖文本/草图生成SVG、复杂编辑与视觉理解的综合基准测试套件。该套件针对当前缺乏符合专业设计流程的真实挑战性基准的问题，构建了包含四项专家人工标注任务的评测体系：创新的草图转SVG任务（VG-Sketch）、采用高阶图元进行多步复杂编辑的新数据集（VG-Edit）、文本生成SVG任务（VG-Text）以及SVG描述生成任务（VG-Cap）。与依赖合成编辑的现有基准不同，VectorGym提供的黄金标准人工标注要求语义理解和设计意图的深度把握。我们还提出基于渲染奖励的多任务强化学习方法，通过课程学习改进的GRPO框架训练Qwen3-VL 8B模型，在开源模型中实现最优性能，超越包括Qwen3-VL 235B在内的更大模型，并与GPT-4o持平。同时创新性地引入VLM-as-a-Judge评估指标，经人工相关性研究验证有效。对前沿视觉语言模型的评估揭示了显著性能差距，使VectorGym成为推进视觉代码生成的严格基准框架。本套件已公开于huggingface.co/datasets/ServiceNow/VectorGym。

English

We introduce VectorGym, a comprehensive benchmark suite for Scalable Vector Graphics (SVG) that spans generation from text and sketches, complex editing, and visual understanding. VectorGym addresses the lack of realistic, challenging benchmarks aligned with professional design workflows. Our benchmark comprises four tasks with expert human-authored annotations: the novel Sketch2SVG task (VG-Sketch); a new SVG editing dataset (VG-Edit) featuring complex, multi-step edits with higher-order primitives; Text2SVG generation (VG-Text); and SVG captioning (VG-Cap). Unlike prior benchmarks that rely on synthetic edits, VectorGym provides gold-standard human annotations that require semantic understanding and design intent. We also propose a multi-task reinforcement learning approach that jointly optimizes across all four tasks using rendering-based rewards. Our method, built on GRPO with curriculum learning, trains a Qwen3-VL 8B model that achieves state-of-the-art performance among open-source models, surpassing much larger models including Qwen3-VL 235B and matching GPT-4o. We also introduce a VLM-as-a-Judge metric for SVG generation, validated through human correlation studies. Our evaluation of frontier VLMs reveals significant performance gaps, positioning VectorGym as a rigorous framework for advancing visual code generation. VectorGym is publicly available on huggingface.co/datasets/ServiceNow/VectorGym.