StarVector:从图像生成可扩展矢量图形代码
StarVector: Generating Scalable Vector Graphics Code from Images
December 17, 2023
作者: Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, Marco Pedersoli
cs.AI
摘要
可伸缩矢量图形(SVG)由于其分辨率的无限可扩展性、多功能性和编辑能力,在现代图像渲染应用中变得至关重要。SVG在网页开发和图形设计领域特别受欢迎。现有的使用深度学习进行SVG建模的方法通常难以生成复杂的SVG,并且局限于需要大量处理和简化的简单SVG。本文介绍了StarVector,一种多模态SVG生成模型,有效地整合了代码生成大型语言模型(CodeLLMs)和视觉模型。我们的方法利用CLIP图像编码器从基于像素的图像中提取视觉表示,然后通过适配器模块将其转换为视觉标记。这些视觉标记被预置到SVG标记嵌入中,然后通过StarCoder模型对序列进行建模,使用下一个标记预测,有效地学习对齐视觉和代码标记。这使得StarVector能够生成准确表示像素图像的不受限制的SVG。为了评估StarVector的性能,我们提出了SVG-Bench,这是一个用于评估SVG方法的全面基准,涵盖多个数据集和相关指标。在这个基准中,我们引入了包括SVG-Stack在内的新颖数据集,这是一个大规模的真实世界SVG示例数据集,并将其用于预训练StarVector作为SVG的大型基础模型。我们的结果显示,与当前方法相比,在视觉质量和复杂性处理方面取得了显著的提升,标志着SVG生成技术的显著进步。代码和模型:https://github.com/joanrod/star-vector
English
Scalable Vector Graphics (SVGs) have become integral in modern image
rendering applications due to their infinite scalability in resolution,
versatile usability, and editing capabilities. SVGs are particularly popular in
the fields of web development and graphic design. Existing approaches for SVG
modeling using deep learning often struggle with generating complex SVGs and
are restricted to simpler ones that require extensive processing and
simplification. This paper introduces StarVector, a multimodal SVG generation
model that effectively integrates Code Generation Large Language Models
(CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to
extract visual representations from pixel-based images, which are then
transformed into visual tokens via an adapter module. These visual tokens are
pre-pended to the SVG token embeddings, and the sequence is modeled by the
StarCoder model using next-token prediction, effectively learning to align the
visual and code tokens. This enables StarVector to generate unrestricted SVGs
that accurately represent pixel images. To evaluate StarVector's performance,
we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods
across multiple datasets and relevant metrics. Within this benchmark, we
introduce novel datasets including SVG-Stack, a large-scale dataset of
real-world SVG examples, and use it to pre-train StarVector as a large
foundation model for SVGs. Our results demonstrate significant enhancements in
visual quality and complexity handling over current methods, marking a notable
advancement in SVG generation technology. Code and models:
https://github.com/joanrod/star-vector